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INTRODUCTION 


The education of the mathematics major begins with the study 
of three basic disciplines: mathematical analysis, analytic geo- 
metry and higher algebra. These disciplines have a number of points 
of contact, some of which overlap; together they constitute the 
foundation upon which rests the whole edifice of modern mathema- 
tical science. 

Higher algebra—the subject of this text—is a far-reaching 
and natural generalization of the basic school course of elementary 
algebra. Central to elementary algebra is without doubt the problem 
of solving equations. The study of equations begins with the very 
simple case of one equation of the first degree in one unknown. 
From there on, the development proceeds in two directions: to 
systems of two and three equations of the first degree in two and, 
respectively, three unknowns, and to a single quadratic equation 
in one unknown and also to a few special types of higher-degree 
equations which readily reduce to quadratic equations (quartic 
equations, for example). | 

Both trends are further developed in the course of higher algebra, 
thus determining its two large areas of study. One—the foundations 
of linear algebra—starts with the study of arbitrary systems of 
equations of the first degree (linear equations). When the number 
of equations equals the number of unknowns, solutions of such 
systems are obtained by means of the theory of determinants. Howe- 
ver, the theory proves insufficient when studying systems of linear 
equations in which the number of equations is not equal to the 
number of unknowns. This is a novel feature from the standpoint 
of elementary algebra, but it is very important in practical appli- 
cations. This stimulated the development of the theory of matrices, 
which are systems of numbers arranged in square or rectangular 
arrays made up of rows and columns. Matrix theory proved to be 
very deep and has found application far beyond the limits of the 
theory of systems of linear equations. On the other hand, investiga- 
tions into systems of linear equations gave rise to multidimensional 
(so-called vector or linear) spaces. To the nonmathematician, mul- 
tidimensional space (four-dimensional, to begin with) is a nebulous 
and often confusing concept. Actually, however, the notion is 
a strictly mathematical one, mainly algebraic, and serves as an 
important tool in a variety of mathematical investigations and 
also in physics. 

The second half of the course of higher algebra, called the algebra 
of polynomials, is devoted to the study of a single equation in one 
unknown but of arbitrary degree. Since there is a formula for solving 
quadratic equations, it was natural to seek similar formulas for 
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higher-degree equations. That is precisely how this division of 
algebra developed historically. Formulas for solving equations 
of third and fourth degree were found in the sixteenth century. 
The search was then on for formulas capable of expressing the roots 
of equations of fifth and higher degree in terms -of the coefficients 
of the equations by means of radicals, even radicals within radicals. 
It was futile, though it continued up to the beginning of the nine- 
teenth century, when it was proved that no such formulas exist 
and that for all degrees beyond the fourth there even exist specific 
examples of equations with integral coefficients Whose roots cannot 
be written down by means of radicals. 

One should not be saddened by this absence of formulas for 
solving equations of higher degrees, for even in the case of third 
and fourth degree equations, where such formulas exist, computa- 
tions are extremely involved and, in a practical sense, almost useless. 
On the other hand, the coefficients of equations one encounters in 
physics and engineering are usually quantities obtained in measu- 
rements. These are approximations. and therefore the roots need 
only be known approximately, to within a specified accuracy. This 
led to the elaboration of a variety of methods of approximate solu- 
tion of equations; only the most elementary methods are given 
in the course of higher algebra. 

However, in the algebra of polynomials the main thing is not 
the problem of finding the roots of equations, but the problem of 
their existence. For example, we even know of quadratic equations 
with real coefficients that do not have real-valued roots. By extending 
the range of numbers to include the collection of complex numbers, 
we find that quadratic equations do have roots and that this holds 
true for equations of the third and fourth degree as well, as follows 
from the existence of formulas for their solution. But perhaps there 
are equations of the fifth and higher degree without a single root 
even in the class of complex numbers. Will it not be necessary, 
when seeking the roots of such equations, to pass from complex 
numbers to a still bigger class of numbers? The answer to this ques- 
tion is contained in an important theorem which asserts that any 
equation with numerical coefficients, whether real or complex, has 
complex-valued (real-valued, as a special case) roots; and, generally 
speaking, the number of roots is equal to the degree of the equation. 

Such, in brief, is the basic content of the course of higher algebra. 
It must be stressed that higher algebra is only the starting point of 
the vast science of algebra which is very rich, extremely ramified 
and constantly expanding. Let us attempt, even more sketchily, 
to survey the various branches of algebra which, in the main, lie 
beyond the scope of the course of higher algebra. 

Linear algebra, which is a broad field devoted mainly to the 
theory of matrices and the associated theory of linear transforma- 
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tions of vector spaces, includes also the theory of forms, the theory 
of invariants and tensor algebra, which plays an important role 
in differential geometry. The theory of vector spaces is further 
developed outside the scope of algebra, in functional analysis 
(infinite-dimensional spaces). Linear algebra continues, so far, 
to occupy first place among the numerous branches of algebra as to 
diversity and significance of its applications in mathematics, physics 
and the engineering sciences. 

The algebra of polynomials, which over many decades has 
been growing as a science concerned with one equation of arbitrary 
degree in one unknown, has now in the main completed its develop- 
ment. It was further developed in part in certain divisions of the 
theory of functions of a complex variable, but basically grew into 
the theory of fields, which we will speak of later on. Now the very 
difficult problem of systems of equations of-arbitrary degree (not 
linear) in several unknowns—it embraces both divisions of the 
course of higher algebra and is hardly touched on in this text —actual- 
ly has to do with a special branch of mathematics called algebraic 
geometry. 

An exhaustive treatment of the problem of the conditions under 
which an equation can be solved in terms of radicals was given 
by the French mathematician Galois (1814-1832). His investiga- 
tions pointed out new vistas in the development of algebra and led, 
in the twentieth century, after the work of the German woman- 
algebraist E. Noether (1882-1935), to the establishment of a fresh 
viewpoint on the problems of algebraic science. There is no doubt 
now that the central problem of algebra is not the study of equa- 
tions. The true subject of algebraic study is algebraic operations, 
like those of addition and multiplication of numbers, but possibly 
involving entities other than numbers. 

In school physics one deals with the operation of composition 
of forces. The mathematical disciplines studied in the junior courses 
of universities and teachers’ colleges provide numerous examples 
of algebraic operations: the addition and multiplication of matrices 
and functions, operations involving vectors, transformations of 
space, etc. These operations are usually similar to those involving 
numbers and bear the same names, but occasionally some of the 
properties which are customary in the case of numbers are lost. 
Thus, very often and in very important instances, the operations 
prove to be noncommutative (a product is dependent on the order 
of the factors), at times even nonassociative (a product of three 
factors depends on the placing of parentheses). 

A very systematic study has been made of a few of the most 
important types of algebraic systems (or structures), that is, sets 
composed of entities of a certain nature for which certain algebraic 
operations have been defined. Such, for example, are fields. These 
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are algebraic systems in which (like in the systems of real and com- 
plex numbers) are defined the operations of addition and multipli- 
cation, both commutative and associative, connected by the distri- 
butive law (the ordinary rule of removing brackets holds) and pos- 
sessing the inverse operations of subtraction and division. The theory 
of fields was a natural area for the further development of the theory 
of equations, while its principal branches—the theory of fields of 
algebraic numbers and the theory of fields of algebraic functions— 
linked it up with the theory of numbers and the theory of functions 
of a complex variable, respectively. The present course of higher 
algebra includes an elementary introduction to the theory of fields, 
and some portions of the course—polynomials in several unknowns, 
the normal form of a matrix—are presented directly for the case 
of an arbitrary base field. 

Broader than a field is the concept of a ring. Unlike the field, 
division is not required. here and, besides, multiplication may be 
noncommutative and even nonassociative. The simplest instances 
of rings are the set of all integers (including negative numbers), 
the set of polynomials in one unknown and the set of real-valued 
functions of a real variable. The theory of rings includes such old 
branches of algebra as the theory of hypercomplex numbers and 
the theory of ideals. It is related to a number of mathematical 
sciences (functional analysis being one) and has already made 
inroads into physics. The course of higher algebra actually contains 
only the definition of a ring. 

Still greater in its range of applications is the theory of groups. 
A group is an algebraic system with one basic operation, which 
must be associative but not necessarily commutative, and must 
possess an inverse operation (division if the basic operation is mul- 
tiplication). Such, for example, is the set of integers with respect 
to the operation of addition and also the set of positive real num- 
bers with respect to the operation of multiplication. Groups were 
already important in the theory of Galois, in the problem of the 
solvability of equations in terms of radicals; today groups are a power- 
ful tool in the theory of fields, in many divisions of geometry, in 
topology, and also outside mathematics (in crystallography and 
theoretical physics). Generally speaking, within the sphere of 
algebra, group theory takes second place after linear algebra as to 
its range of applications. Our course of higher algebra contains 
a chapter on the fundamentals of the theory of groups. 

In recent decades an entirely new branch of algebra—lattice 
theory—has come to the fore. A lattice is an algebraic system with 
two operations—addition and multiplication. These operations 
must be commutative and associative and must also satisfy the 
following requirements: both the sum and the product of an element 
with itself must be equal to the element; if the sum of two elements 
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is equal to one of them, then the product is equal to the other, and 
conversely. An example of a lattice is the system of natural num- 
bers relative to the operations of taking the least common multiple 
and the greatest common divisor. Lattice theory has interesting 
ties with the theory of groups and the theory of rings, and also 
with the theory of sets; one old branch of geometry (projective 
geometry) actually proved to be a part of the theory of lattice. 
It is also worth mentioning the expansion of lattice theory into 
the theory of electric circuits. 

Certain similarities between parts of the theories of groups, 
rings and lattices led to the development of a general theory of 
algebraic systems (or universal algebras). The theory has only taken 
a few steps but its general outlines are evident and certain links 
with mathematical logic that have been perceived point to a rich 
future in this area. 

The foregoing scheme does not of course embrace the whole 
range of algebraic science. For one thing, there are a number of 
divisions of algebra bordering on other areas of mathematics, such 
as topological algebra, which deals with algebraic systems in which 
the operations are continuous relative to some convergence defined 
for the elements of the systems. An example is the system of real 
numbers. Closely related to topological algebra is the theory of 
continuous (or Lie) groups, which has found numerous applica- 
tions in a broad range of geometrical problems, in theoretical physics 
and hydrodynamics. Incidentally, the theory of Lie groups is chara- 
cterized by such an interweaving of algebraic, topological, geome- 
tric and function-theoretic methods as to be more properly conside- 
red a special branch of mathematics altogether. Next we have the 
theory of ordered algebraic systems which arose out of investigations 
into the fundamentals of geometry and has found applications 
in functional analysis. Finally, there is differential algebra which 
has established fresh relationships between algebra and the theory 
of differential equations. 

Quite naturally, the flowering of algebraic science so evident 
today is not accidental, but is an organic part of the general advance 
of mathematics and is due, in large measure, to the demands made 
upon algebra by the other mathematical sciences. On the other hand, 
the development of algebra itself has exerted a far-reaching influence 
on the elaboration of allied branches of science; this influence has 
been particularly enhanced by the spread of applications so chara- 
cteristic of modern algebra. One is often tempted to speak of an 
“algebraization” of mathematics. 

We conclude this rather sketchy survey of algebra with a gene- 
ral historical background. 

Babylonian and, later, ancient Greek mathematicians studied 
certain problems of algebra, in particular the solution of simple 


12 INTRODUCTION 


equations. The peak of algebraic investigations during this period 
was reached in the works of the Greek mathematician Diophantos 
of Alexandria (third century). These studies were then extended by 
mathematicians of India: Aryabhata (sixth century), Brahmagupta 
(seventh century), and Bhaskara (twelfth century). In China, alge- 
braic problems got an early start: Ch’ang Ts’ang (second century 
B.C.), Ching Chou-chan (first century A.D.). An outstanding Chinese 
algebraist was Ch’in Chiu-shao (thirteenth century). 

A major contribution to the development of algebra was made 
by scholars of the Middle East whose writings were in Arabic, par- 
ticularly the Uzbek scholar Muhammad al-Khow§arizmi (ninth cen- 
tury) and the Tajik mathematician and poet Omar Khayyam (1040- 
1123). In particular, the very term “algebra” came from the title 
of al-Khowdarizmi’s treatise Hisdb al-jabr w’al-muqd-balah. 

The above-mentioned studies of Babylonian, Greek, Indian, 
Chinese, and Central-Asian algebraists have to do with those pro- | 
blems of algebra which constitute the present school course of ele- 
mentary algebra and only occasionally touch on equations of the 
third degree. That, in the main, was the range of problems that 
interested medieval European algebraists and those of the Renais- 
sance, such as the Italian mathematician Leonardo of Pisa (Fibo- 
nacci) (twelfth century) and the founder of present-day algebraic 
symbolism, the Frenchman Vieta (or Viéte) (1540-1603). We have 
already mentioned that in the sixteenth century methods were 
found for solving equations of the third and fourth degree; here 
we must mention the names of the Italians Ferro (1465-1526), Tar- 
taglia (1500-1557), Cardano (1504-1576) and Ferrari (1522-1565). 

The seventeenth and eighteenth centuries saw an intensive ela- 
boration of the general theory of equations (or the algebra of poly- 
nomials) in which outstanding scholars of the time participated: 
Descartes (1596-1650), Sir Isaac Newton (1643-1727), d’Alembert 
(1717-1783) and Lagrange (1736-1813). In the eighteenth century, 
the Swiss mathematician Cramer (1704-1752) and Laplace (1749- 
1827) of France, laid the foundation of the theory of determinants. 
At the turn of the century, the great German mathematician Gauss 
(1777-1855) proved the earlier mentioned fundamental theorem on 
the existence of roots of equations with numerical coefficients. 

The first third of the nineteenth century stands out in the history 
of algebra as the time when the problem of the solvability of equa- 
tions by radicals was resolved. Proof of the impossibility of obtain- 
ing formulas for the solution of equations of degree five or higher was 
obtained by the Italian mathematician Ruffini (1765-1822) and in 
more rigorous form by the Norwegian Abel (1802-1829). As already 
mentioned, an exhaustive treatment of the problem of the conditions 
under which an equation admits of solution in terms of radicals 
was given by Galois. ee 
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Galois’ theory spurred the advance of algebra in the latter half 
of the nineteenth century. There appeared the theory of fields of 
algebraic numbers and of fields of algebraic functions and the asso- 
ciated theory of ideals. Here, mention should be made of the German 
mathematicians Kummer (1810-1893), Kronecker (1823-1891), and 
Dedekind (1834-1916), and the Russian mathematicians E. I. Zolo- 
tarev (1847- 1878) and G. F. Voronoi (1868-1908). Particular advances 
were made in the theory of finite groups which grew out of the research 
of Lagrange and Galois; this work was carried out by the French 
mathematicians Cauchy (1789-1857) and Jordan (1838-1922), the 
Norwegian Sylow (1832-1918), the German algebraists Frobenius 
(1849-1918) and Holder (1859-1937). The investigations of the Nor- 
wegian S. Lie (1842-1899) initiated the theory of continuous groups. 

~ The works of Hamilton (1805-1865) and the German mathemati- 
cian Grassmann (1809-1877) laid the foundations for the theory 
of hypercomplex systems or, as we now say, the theory of algebras. 
A prominent role in the development of this branch of algebra was 
played (at the end of the century) by the Russian mathematician 
F. E. Molin (1864-1941). 

Linear algebra attained great heights in the nineteenth century 
primarily due to the work of the English mathematicians Sylvester 
(1814-1897) and Cayley (1821-1895). Work continued on the algebra 
of polynomials; we note only the method of approximate solution 
of equations found by the Russian geometer N. I. Lobachevsky 
(1792-1856) and the work of the German Hurwitz (1859-1949). Alge- 
braic geometry was begun in the latter part of the nineteenth century, 
particularly in the works of the German mathematician M. Noether 
(1844-1922). 

In the twentieth century, algebraic studies expanded considerab- 
ly and algebra, as we already know, occupies a very special place 
of honour in mathematics. New divisions of algebra have sprung 
up, including the general theory of fields (in the 1910’s), the theory 
of rings and the general theory of groups (1920's), topological algebra 
and lattice theory (1930's), the theory of semigroups and the theory 
of quasigroups, the theory of universal algebras, homological algebra, 
the theory of categories (all in the 1940’s and 1950’s). Prominent 
mathematicians are presently engaged in all spheres of algebra, and 
in a number of countries (in the Soviet Union, for example) whole 
schools of algebra are in evidence. 

Among the prerevolutionary Russian algebraists, noteworthy 
contributions to algebra were also made by S.O. Shatunovsky 
(1859-1929) and D. A. Grave (1863-1939). However, it was only 
after the Great October Revolution of 1917 that algebraic investiga- 
tions in the Soviet Union reached high peaks. These studies now 
embrace practically all divisions of modern algebraic science and 

` in some the work of Soviet algebraists is of a leading nature. Suffice 
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it to name only two algebraists: N. G. Chebotarev (1894-1947), who 
worked in the theory of fields and Lie groups, and O. Yu. Schmidt 
(1891-1956), the famous polar explorer who was also a noted algeb- 
raist and founded the Soviet school of group theory. 

We conclude this brief survey of the historical background and 
modern state of algebra with the remark that most of the fields of 
research mentioned here lie beyond the scope of- the present course 
of higher algebra. The aim of the survey was to help the reader to 
find the proper place for this text in Alpena lga science as a whole 
within the edifice of mathematics. - 


CHAPTER 1 


SYSTEMS 
OF LINEAR EQUATIONS. 
DETERMINANTS 


1. The Method of Successive Elimination of Unknowns 


We begin the course of higher algebra with a study of systems 
of first-degree equations in several unknowns or, to use the more 
common term, systems of linear equations.* 

The theory of systems of linear equations serves as the foundation 
for a vast and important division of algebra—linear algebra—to 
which a good portion of this book is devoted (the first three chapters 
in particular). The coefficients of the equations considered in these 
three chapters, the values of the unknowns and, generally, all num- 
bers that will be encountered are to be considered real. Incidentally, 
_ all the material of these three chapters is readily extendable to the 
case of arbitrary complex numbers which are familiar from elemen- 
tary mathematics. 

In contrast to elementary algebra, we will study systems with 

an arbitrary number of equations and unknowns; at times, the 
number of equations of a system will not even be assumed to coincide 
with the number of unknowns. Suppose we have a system of s linear 
equations in n unknowns. Let us agree to use the following symbo- 
lism: the unknowns will be denoted by x and subscripts: zi, x2, . . 
. ++, Zn; we will consider the equations to be enumerated thus: 
first, second, sth; the coefficient of z; in the ith equation will 
be given as agtt. Finally, the constant term of the ith equation will 
be indicated as b;. 


* The term “linear” stems from analytic geometry, where a first-degree 
equation in two unknowns defines a straight line in a plane. 

** We thus use two subscripts, the first indicates the position number of 
the equation, the second the osition number of the unknown. They are to be 
read: ay, “a sub one one” ane not “a eleven”; a3, “a sub three four” and not 
“q thirty-four”, and are not separated by a comma. 
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Our system of equations will now be written as follows: 


Ayr, + loty +. ~~. + amir = bi, 


aTi + aoto + ~~. F amin = bo, 
(1) 
asz + asto F... T asin = bs 
The coefficients of the unknowns form a rectangular array: 
Qiilig - + Qin 
51499 eee Qon y 
(2) 
Asilsg » -> Asn 


called a matrix of s rows and n columns; the numbers a;; are termed 
elements of the matrix.* If s=n (which means the number of rows 
is equal to the number of columns), then the matrix is called a square 
matrix of order n. The diagonal of the matrix from upper left corner 
to lower right corner (i.e., composed of the elements @4;, Aag, . . -s Qan) 
is called the principal diagonal. We call a square matrix of order 
na unit matriz of order n if all the elements of its principal diagonal 
are equal to unity and all other elements are zero. 

The solution of the system of linear equations (1) is a set of n 


numbers ky, ka, ..., kn such that each of the equations (1) becomes 
an identity upon substitution of the corresponding numbers ki, 
i = 1, 2,..., n for the unknowns z;.** 


A system of linear equations may not have any solutions; it is 
then called inconsistent. Such, for example, is the system 


x, + oz, = 1, 
zı + 5r, = 7 


The left members of these equations coincide, but the right members 
are different and so no set of values of the unknowns can satisfy 
both equations simultaneously. 

If a system of linear equations has solutions, it is termed con- 
sistent. A consistent system is called determinate if it has a unique 
solution—only such are considered in elementary algebra—and inde- 
terminate if there are more solutions than one. As we shall learn 
later on, there may even be an infinity of solutions. For instance, 


* Thus, if the matrix (2) is regarded by itself (not connected with the 
system (1)), then the first subscript of element a;; indicates the number of the 
row, the second the number of the column at the intersection of which the 
element is positioned. 

** We stress the fact that the numbers kı, ke, ..., kp constitute a single 
solution of the system and not n solutions. 
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the system 


oe) 
2 F By = 4 


is determinate: it has the solution z, = 1, z = 3 and, as may readily 
be verified by the method of elimination, this solution is unique. 
On the other hand, the system 


324 — oe 


62, — 2z = 2 
is indeterminate since it has infinitely many solutions of the torm 


where k is an arbitrary number; the solutions obtained using for- 
mulas (3) exhaust the solutions of the system. 

The problem of the theory of systems of linear equations consists 
in elaborating methods to determine whether a given system of equa- 
tions is consistent or not and, in the case of consistency, to establish 
the number of solutions and also to indicate a procedure for finding 
the solutions. 

We begin with the most convenient practical method for finding 
solutions to systems with numerical coefficients, namely, the method 
of successive elimination of unknowns, or Gauss’ method. 

First, a preliminary remark. In future we will manipulate systems 
of equations in the following manner: both members of one of the 
equations of the system multiplied by one and the same number will 
be subtracted from the corresponding members of some other equation 
of the system. For the sake of definiteness, let us subtract both 
members of the first equation of system (1), multiplied by a number 
c, from the corresponding members of the second equation. We obtain 
a new system of linear equations: 


Ayti F lta +... F ainin = bis 
anti F agta T». + antn = Dg, 
azti F azta +. oe Gan%n = bs, (4) 
Asizi F Asafo +. os Asn In = bs 
where . 
Qo; = Qaj — Ch; for j = 1, 2,..., nm, bh = b, — cb; 


The systems (I) and (4) are equivalent, which is to say they are 
either both inconsistent or they are both consistent and have the same 
solutions. Indeed, let ky, kg, ..., kn be an arbitrary solution of 
system (1). Obviously, these numbers satisfy all the equations of (4) 
except the second. However, they likewise satisfy the second equa- 


2—5760 
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tion of the system (4). It will suffice to recall how this equation 
is expressed in terms of the second and first equations of system (4). 
Conversely, any solution of (4) will also satisfy (1). Indeed, the 
second equation of (1) is obtained by subtracting, from both members 
of the second equation of (4), the corresponding members of the 
first equation of the system multiplied by the number —c. 

Quite naturally, if manipulations of this kind are applied several 
times to system (1), the newly obtained system of equations will remain 
equivalent to the original system (1). 

It may happen that as a result of such manipulations, there 
will appear in our system an equation whose coefficients in the 
left-hand member are equal to zero. Now if the constant term of this 
equation is zero, then the equation is satisfied for any values of the 
unknowns and so by discarding this equation we arrive at a system of 
equations equivalent to the original system. But if the constant term 
of the equation at hand is nonzero, then the equation cannot be 
satisfied for any values of the unknowns and for this reason the system 
obtained (and the equivalent original system as well) will be inconsistent. 

Let us now examine Gauss’ method. 

We are given an arbitrary system of linear equations (1). To be 
specific, suppose that the coefficient a,;,540, though in reality it 
may of course be equal to zero and then we would have to start with 
some other, nonzero, coefficient of the first equation of the system. 

Let us now transform system (1) by eliminating the unknown 
zı from all equations except the first. To do this, multiply both mem- 
bers of the first equation by the number n and subtract from the 
corresponding members of the second equation; then subtract both 
members of the first equation, multiplied by i from the corre- 
sponding members of the third equation, and so on. 

We thus arrive at a new system made up of s linear equations 
in n unknowns: 


lyzi + Mata + Qiza +... F Ant, = bi } 
Ro -+ a,3%3 ee mas Asn En = Ox 
aty F azta F>- F gta = by (5) 
Biol, F Agata +... F antn = b; 


We do not need to write out explicitly the expressions of the new 
coefficients a;; and the new constant terms b; via the coefficients 
and constant terms of the original system (1). 

As we know, the system of equations (5) is equivalent to (1). 
Now transform (5). We no longer involve the first equation and 
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manipulate only that portion of (5) consisting of all equations 
except the first. We of course assume that there are no equations 
with all coefficients of the left members zero (such would have been 
rejected if their constant terms were likewise zero, and if that were 
not so, we would have proved the inconsistency of our system). 
Thus, among the coefficients aj; there are some different from zero; 
for definiteness, we put a,, = 0. Now transform (5) by subtracting 
from both members of the third and of each of the succeeding equa- 
tions both members of the second equation multiplied respectively 
by the numbers , 
2s Gap asa 


t ? t 7 es. Sy + 
a9 a29 Q99 


In this way we eliminate the unknown z, from all equations, except 
the first and second, and arrive at the following system of equations 
which is equivalent to (5) and hence to (4): 


aty E ajta F ligt +... + antn = Oy, 
l ‘ 1 a TOR r 
azo F agza F- T anin = by 

azas Ho- + anin = 0} 

"n ” m | 

Gita Sat ea ey Sok, 


Our system now contains ¢ equations, £ < s, since some of the equa- 
tions were possibly discarded. Naturally the number of equations 
of the system could already have diminished after eliminating 
the unknown z. Subsequently, only a portion of the system obtained 
(that containing all equations except the first two) will be subject 
to transformations. 

The question arises as to when this process of successive elimi- 
nation of unknowns will stop. 

If we arrive at a system in which one of the equations has a non- 
zero constant term and all the coefficients of the left member are 
equal to zero, then, as we know, our original system was incon- 
sistent. 

If that is not the case, then we obtain the following system of 
equations which is equivalent to system (4): 


i Ayr Fata oe Oy, hh- + AEk F e. + Andy = by, 
| zat, +... +42, k—12k-1 + ante +... + Ont = bh, 
ees gle a Je alo Ge ay a Oe a ie ier Ha) a e era RO BO Se, we (6) 
AK 1, saat Bent, wea o- FORT nEn =DE, 


k—i k-i k—i 
akk a, H.. + af Dz, = be 
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Here ay,;0, a,, #0, ..., af) k-41 FO, a (RT) = 0. Note also that 
k < s, and, obviously, kxn. 
In this case system (1) is consistent. It will be determinate for k =n 


and indeterminate for k < n. 
Indeed, if k = n, then system (6) has the form 


aizi F Ayo +... + Aynln = b, 


, (d 
Ago%o + eee + Goats = bz, (7) 
ON ee ie bey 


From the last equation we obtain a quite definite value for the 
unknown z,. Substituting it into the next to the last equation, we 
find a uniquely defined value for the unknown z,.,. Continuing in 
similar fashion, we find that system (7) and, for this reason, system 
(4) as well have a unique solution, that is to say, they are consistent 
and determinate. 

But if k< n, for the “free” sates h+ +++ Zn we take 
arbitrary numerical values, then, moving, in system (6) from bot- 
tom to top, we find quite definite values for the unknowns 
Zh Lat» +++, Ly, Z4 (as above). Since «he values for the free 
unknowns may be chosen in an infinity of ways, our system (6) and, 
hence, (1) as well are consistent but indeterminate. It is easy to 
verify that by using the foregoing method (given all possible choices of 
values for the free unknowns) we can find all the solutions of system (1). 

At first glance, yet another form to which a system of linear 
equations may be reduced by the Gaussian method would appear 
possible, namely, the form obtained by adjoining to system (7) a num- 
ber of equations containing only the unknown zn. Actually, however, 
in this case the transformations have simply not been completed: 
since a1) 0, the unknown z, may be eliminated in all equancns 
from the (n + 1)th on. 

Note that the “triangular” form of the system of equations (7) 
or the “trapezoidal” form of system (6) (for k < n) resulted from the 
assumption that the coefficients a,,, a,,, etc. are different from zero. 
In the general case, the system of equations which we arrive at after 
completing the process of elimination of unknowns takes on a trian- 
gular or trapezoidal form only after an appropriate alteration in the 
numbering of the unknowns. 

To summarize, then, we find that the Gaussian method is applicable 
to any system of linear equations. The system is inconsistent if after 
the transformations we obtain an equation in which the coefficients of 
all unknowns are zero and the constant term is nonzero; but if no such 
equation is encountered, the system is consistent. A consistent system of 
equations is determinate if it reduces to the triangular form (7) and 
indeterminate if it reduces to the trapezoidal form (6) for k < n. 
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Let us apply what has been said to the case of a system of homo- 
geneous linear equations, that is, equations whose constant terms 
are zero. Such a system is always consistent since it has a zero solu- 
tion (0, 0, ..., 0). Suppose that in the system at hand the number 
of equations is less than the number of unknowns. Then our system 
cannot reduce to the triangular form since in the Gaussian elimina- 
tion process the number of equations of the system can diminish 
but not increase; hence, it reduces to the trapezoidal form and so 
is indeterminate. 

To put it otherwise, if in a system of homogeneous linear equations 
the number of equations is less than the number of unknowns, then this 
system has, in addition to the zero solution, nonzero solutions, that is, 
solutions in which the values of some (or even all) unknowns are 
nonzero. There is an infinity of such solutions. 

In practical solutions of a system of linear equations by the 
Gaussian method, one should write down the matrix of the coeffi- 
cients of the system and adjoin a column made up of the constant 
terms, which, for the sake of convenience, are separated by a vertical 
line, and then perform all the manipulations on the rows of this 
“augmented” matrix. 


Example 1. Solve the system 
xy + 2z + 523 = —9, 
zti — w+ 3r; = 2, 
32, — bzy — z3 = 25 


Transform the augmented matrix of the system: 


1 2 5—9 1 2 5j;—9 1 2 5 
k —1 3 2) > (o — 3 —2 11) > (o —3 —2 
25 0 —12 —416 | 52 0 0—8 


3 —6 —1 
We thus arrive at the following system of equations: 
ty + 222 + 043 = —9, 
— 3x2, —223= 11, 
>= 823 — 8 


—9 
s) 
8 


which has the unique solution 

tı = 2, zt = —3, z; = —İ 

The original system proved to be determinate. 

Example 2. Solve the system 

zı — öt — 823+ rn, = 3, 
3a, t z2 — 3z; — 5r, = 1, 
Zi SF 7123 + 2z, = —5, 
iizz +- 2073 — 9z, = 2 


aeea weena oa ae” 
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We transform the augmented matrix of the system: 


1 —5—8 1] 3 1-5-8 14} 3 
3 da3 0 16 21 —8|—8 
Oe hsp {lo 5 14 ae 
0 44 20 —9| 2 0 44 20 —9| 2 
i —5-8 14] 3 1 —5—8 1| 3 

„(0—89 o —29|160 | f 0—89 o —29/ 160 
0 5 4 Ay 0 5 4 1ļl—8 
0 —89 0 —29] 162 0 0 0 oj 2 


We arrive at a system containing the equation 0 = 2. Consequently, the original 
system is inconsistent. 
Example 3. Solve the system 


4a, +} z — 323 — a, = 0, 
2r, + 3ta + 23 — 5a, = 0, | 
zı — 2z, — 2z; + 3x, = 0 
This is a system of homogeneous equations, and the number of equations 
is less than the number of unknowns; it must therefore be indeterminate. Since 


all the constant terms are zero, we perform manipulations solely with the mat- 
rix of the coefficients of the system: 


d ae | 0 9 5-18, JE 2 0 —2 
(2 3 =s) = (0 7 5 t) > (0 i 5 =] 
fi DED 53 on eee ae f 20 i8 


We arrive at the following system of equations: 
222 — 2z, = 0, 

Tz +- 523 — iiz, = 0, | 
zı — 2r, — 223 + 32, = 0 


We can take either one of the unknowns z, or z, for the free unknown. 
Let +, = a. Then from the first equation it follows that z, = a, and from 


the second equation we get z3 = = a and, finally, from the third equation z, = 


3 
= =O Thus, 
3 


Bz ee ë 
Boh Ar Ti 


is the general form of the solutions of the given system of equations. 


2. Determinants of Second and Third Order 


The method of solving systems of linear equations given in 
Sec. 4 is extremely simple and requires the performance of the same 
kind of computations, which are readily carried out on computing 
machines. Its drawback, however, is that it does not enable us to 
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state the conditions of consistency or determinacy of the system by 
means of coefficients and constant terms of the system. On the other 
hand, even in the case of a determinate system, this method does 
not permit finding formulas that express the solution of the system 
in terms of its coefficients and constant terms. However, all this 
proves to be necessary in theoretical problems, in particular, in geo- 
metrical investigations; for this reason, the theory of systems of 
linear equations has to be elaborated by different and more profound 
methods. The general case will be pursued in the next chapter; for 
the present, we consider determinate systems having an equal num- 
ber of equations and unknowns. We begin with the systems in two 
and three unknowns of elementary algebra. 

Let there be given a system of two linear equations in two unknowns: 


zi + aita = Dy, 
ue, (1) 
aati F aota = dy 
whose coefficients form the second-order square matrix 
ai liz 
( | (2) 
ot Aaz 


ee? to system (4) the method of equalizing the PSECU: 
obtain 


(ditag — 49451) Ty = b1äzg — Aigba, 


(diilo — Q4_Qq1) Lo = 11b — bilgi 
Suppose that &i1đz2 — 4.4,, 0. Then 


z= Ae aie = pa eas , (3) 
11222 1221 11222 12724 

It is easy to show, by substituting the values of the unknowns into 

(1), that (3) isa solution of system (1). The question of the unique- 

ness of this solution will be considered in Sec. 7. 

The common denominator of the values of the unknowns (3) is 
very simply expressed in terms of the elements of matrix (2): it is 
equal to the product of the elements of the principal diagonal minus 
the product of the elements of the secondary diagonal. This number 
is called the determinant of the matrix (2); we call it a second-order 
determinant since the matrix (2) is a second-order matrix. To symbo- 
lize a determinant, we use vertical lines in place of parentheses: 


ay O19 l 
= Q182 — A404 (4) 


a21 a22 
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Examples. 
37 
1 =3-4—7-1=5, 
(1) p 4 
1 —2 
2 =1.5—(—2)-3= 11 
(2) p p { tho (—2)-8= 


It is worth stressing once again, that while a matrix is an array 
of numbers, a determinant is a number associated in a definite way 
with a square matrix. The products a,,a., and aiā; are called the 
terms of a second-order determinant. 

The numerators of expressions (3) have the same form as the 
denominators, that is, they are also determinants of second order; 
the numerator of the expression for z, is the determinant of the 
matrix obtained from matrix (2) by replacing its first column by the 
column of constant terms of system (1), the numerator of the expres- 
sion for x, is the determinant of the matrix obtained from matrix 
(2) by replacing its second column. We can now write formula (3) 
as follows: | 


by a ay bi 
ba dzz do, b2 
2i = r, La T (5) 
Qil M42 444 a2 
a24 Age a21 422 


This rule for the solution of a system of two linear equations 
in two unknowns (called Cramer’s rule) is formulated as follows. 

If the determinant, (4), of the coefficients of a system of equations, 
(1), is different from zero, we obtain the solution of system (1) by taking 
for the values of the unknowns the fractions whose common denominator 
is determinant (4) and whose numerator for the unknown z; (i = 1, 2) 
is a determinant obtained by replacing in determinant (4) the ith column 
(that is, the column of coefficients of the desired unknown) by the column 
of the constant terms of system (1).* 


Example. Solve the system 


224,+ t= 7, \ 


ry — 329 = 
The determinant of the coefficients is 
2 4 
d = uae 7 


It is different from zero and, for this reason, Cramer’s rule is applicable. 
The determinants 


= —i1 
1 —2 
* For brevity we speak here of replacing columns “in the determinant”. 
In the same way, we will in future, if it is more convenient, speak of rows and 
columns of a determinant, of its elements and diagonals, etc. 
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are the numerators for the unknowns. Thus, the folowing: set of numbers is the 
solution of our system: 


The introduction of second-order determinants does not sub- 
stantially simplify the solution of a system of two linear equations 
in two unknowns, which does not present any difficulties as it is. 
However, for the case of systems of three linear equations in three 
unknowns, similar methods are of practical utility. Suppose we 
have a system 

jti + liala + Ay3x3 = bı, \ 
dati F aszt T Ay3%3 = bo, | (6) 


A341 -F azza + Az3%3 = bz 


with the coefficient matrix 


aii aig 413 
Aa lag da3 l (7) 
a31 a32 33 


If we multiply both sides of the first equation of (6) by the num- 
ber @.,433 — Gy3439, both sides of the second equation by a,3a3, — 
— A433, both sides of the third equation by a@;.@,3 — d13@,2, and 
then add all three equations, it is easy to verify that the coefficients 
of z and z, will turn out to be zero, that is, these unknowns are 
eliminated simultaneously and we obtain the equation 


(A41Qg9433 + Aioloz F A132132 — Ay3Ag9M34 — 42054433 
— 4443039) Ly = D4Aq9A33 F AyoQq3b3 + Qizbola — A430qqb3 
— A4b5033 — b4Aq3039 (8) 


Here, the coefficient of x, is called a third-order determinant cor- 
responding to matrix (7). The symbolism is the same as in the case 
of second-order determinants; thus, 


G1, Qig 43 


Qa Age Ogg] = Ay1Ag9M33 F Ay_Qg3g4 -F Q13đz143 (9) 
i — 04 3A90094 — Qiođls1l33 — 244590 
Q31 3z @33 13490434 19491433 11493039 


The expression for a third-order determinant is rather involved, 
but the rule for its formation from the elements of matrix (7) is extre- 
mely simple, as witness: one of the three terms (of the determinant) 
in (9) with the plus sign is the product of the elements of the prin- 
cipal diagonal, each of the other two is a product of the elements 
lying parallel to this diagonal, with the third factor added from 
the opposite corner of the matrix. The terms with the minus sign 
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in (9) are constructed in a similar manner but relative to the secon- 
dary diagonal. We obtain a technique for computing determinants 
of the third order that produces quick results (after a certain amount 


ied 


of practice). Fig. 1 gives a schematic view of computing the positive 
terms (left) and the negative terms (right) of a third-order deter- 
minant. . 


Fig. 1 


Examples. 
z Tai “| | 
(1) —434/ = 2:3:5 + 4-4.2 + 2-(—4)-3 
235] — 2:3-2 — 1:(—4):5 — 2-4-3 
= 30 + 2 — 24 — 12, + 20 —6 = 10 
E © | } 
(2) = 143:0 + 0-2-4 + (—5)-(—2)-(—2) 


SB BS E e 
1—2 oļ= 20+ 15+4= 


The right-hand side of (8) is also a third-order determinant, name- 
ly, the determinant of the matrix obtained from matrix (7) by 
replacing its first column by the column of constant terms of system 
(6). If we denote determinant (9) by the letter d and the determinant 
obtained by replacing its jth column (j = 1, 2, 3) by the column 
of constant terms of system (6) by the symbol dj, then equation (8) 
becomes dx, = dı, whence, for d= 0, it follows that 


In exactly the same way, by multiplying equation (6) by the 
numbers @q3031 — Go1833, 11433 — G19%31, “G13%21 — 11223, . Tespec- 
tively, we obtain for z, the following expression (again for d = 0): 


da 14 

2= T ae 
Finally, multiplying these equations, respectively, by a,4@3, — as2431, 
Ayohg, — 44499, Ajil — ligg, We arrive at the expression for zs: 


r=% (12) 
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-Substituting expressions (10) to (12) into equation (6) (it is 
of course assumed that the determinants d and all d; are written 
in expanded form), we would find—after cumbersome computations, 
all, however, well within the grasp of the reader—that all these 
equations are satisfied, that is, that the numbers (10)-(12) consti- 
tute the solution of system (6). Thus, if the determinant of’ the coef- 
ficients of a system of three linear equations in three unknowns is nonzero, 
then the solution of this system may be found by Cramer's rule as stated 
for the case of a system of iwo equations. In Sec. 7 the reader will find 
a different proof of this assertion (one that does not rely on the cal- 
culations we have omitted here) and also a proof of the uniqueness 
of the solution (10)-(12) of system (6) for ube ‘more general case. 


Example. Solve the system of equations 

24, — tet z= 0, 

32, + 2a, — 523 = 1, 

. a, + 329 — 223 = 4 
The determinant of the coefficients is nonzero: 


2-4 4 
d=|3 . 2 —5|= 28 
1 3—2 
so the Cramer rule is applicable. The numerators for the unknowns are 
0—1 4 | 20 4j. oe 
d=/4 2 —5|= 13, d=|3 1 —5| = 47, 
4 3—2 14 —2 
2 —10 
d3 3 24: S21 
1 34 
Hence, the following numbers constitute the solution of the system: 
m j eee pee _ 2 
287 28° 28. 4 


3. Arrangements and Permutations 


In the study of determinants of order n we will need certain 
concepts and facts relating to finite sets. Suppose we have a certain 
finite set M consisting of n elements, which may be enumerated by 
using the natural numbers 1, 2,.. ., n; since the properties of the 
elements of the set M will not play any role whatsoever, we simply 
say that the elements of M are the numbers 1, 2, ..., 7. 

Besides the natural order of 1, 2, ..., m, we can arrange the 
numbers in many other ways. Thus, we can arrange the numbers 
1, 2, 3, 4as 3,1, 2, 40r 2, 4, 1, 3 and so on. Every rearrangement 
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of the numbers 1, 2, ..., n in any definite order is called a per- 
mutation (or arrangement)* of n numbers (or n symbols). 

The number of distinct arrangements of n symbols is equal to the 
product 1-2... n, denoted by n! (read “n factorial”). Indeed, the 
general form of an arrangement of n symbols is i,, iz, ..., in, where 
each of the i, is one of the numbers 1, 2, ..., n, without repetitions. 
Use any one of the numbers 1, 2, ..., n for i,; this yields z distinct 
possibilities. But if i, has been chosen, then for i, we can only take 
one of the remaining n — 1 numbers; that is, the number of diffe- 
rent ways of choosing the symbols i, and i, is equal to the product 
n (n — t) and so on. 

Thus, the number of arrangements of n symbols for n = 2 is 2! = 
= 2 (the arrangements 12 and 21; in examples where n<9, we 
do not separate the symbols by commas); for n = 3 this number is 
3! = 6, for n = 4 it is 4! = 24. As n increases, the number of arran- 
gements increases very fast: for n = 5 it is 5! = 120, and for n = 10 
it is already 3,628,800. 

If in a certain arrangement we interchange any two symbols 
(not necessarily adjacent) and leave all the remaining ones fixed, we 
obtain a new arrangement. This operation is called a transposition. 

All n! arrangements of n symbols may be ordered so that each is ob- 
tained from the preceding one via a single transposition; any arrange- 
ment can serve as the starting point: 

This assertion holds true for n = 2: if it is required to begin 
with the arrangement 12, the desired order will be 12, 21; if we 
begin with the arrangement 21, then the order will be 21, 12. Sup- 
pose our assertion has already been proved for n — 1, and we prove 
it for n. Let us begin with the arrangement 


ity bey 2 2 ey bn (1) 


We consider all arrangements of n symbols starting with i, There 
are (n — 1)! such arrangements and they may be ordered in accord 
with the requirements of the theorem, beginning with (1) since this 
actually reduces to an ordering of all arrangements of n — 1 sym- 
bols; this ordering, by the induction hypothesis, may be initiated 
from any arrangement, say, is, ..., in. In the last of the arrange- 
ments of n symbols thus obtained we perform a transposition of i, 
and any other symbol (say i,) and, again beginning with the arran- 
gement obtained, we appropriately order all the arrangements with 
i, in first place, and so forth. It is thus obviously possible to enume- 
rate all arrangements of n symbols. 


* Translator’s note: the term “arrangement” will be used, since permuta- 
tion is reserved in this text for a different concept. 
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From this theorem it follows that it is possible to pass from any 
arrangement of n symbols to any other arrangement of the same sym- 
bols by means of several transpositions. 

We say that in a given arrangement the numbers i and j consti- 
tute an inversion if i >> j but i comes before j in the arrangement. 
An arrangement is termed even if its symbols form an even number 
of inversions, otherwise it is odd. Thus, the arrangement 1, 2,...,7 
is even for any n since the number of inversions here is zero. The 
arrangement 451362 (n = 6) contains 8 inversions and so is even. 
The arrangement 38524671 (n = 8) contains 15 inversions and so is 
odd. 

Every transposition changes the parity of the arrangement. 

To prove this important theorem let us first consider the case 
where the symbols i and j being interchanged are adjacent; in other 
words, the arrangement is of the form ..., i, j, ..., where the 
dots stand for symbols unaltered by the transposition. The trans- 
position converts our arrangement into the arrangement ..., j, 
i,. .., it being understood that in both cases each of the symbols i, j 
constitutes the same set of inversions with the symbols which remain 
fixed. Whereas earlier i and j did not constitute an inversion, in the 
new arrangement there is a fresh inversion; hence, the number of 
inversions has increased by unity; contrariwise, if they originally 
formed an inversion, then the inversion now vanishes, the number 
of inversions being diminished by unity. In both cases the parity 
of the arrangement is altered. 

Now let us suppose that there are s symbols, s > 0, between 
i and j; that is, the arrangement is of the form 


er ig. Kyr ko eee Be age Dye eun (2) 


The symbols i and j may be interchanged by means of a succession 
of 2s + 1 transpositions of adjacent elements. These are transpo- 
sitions interchanging the symbols i and kı, then interchanging i 
(now in the place of k,) and k, and so on until i occupies the site’ 
of symbol k,. These s transpositions are then followed by a trans- 
position that interchanges the symbols i and j and then s transposi- 
tions of the symbol j with all k’s; as a result, j occupies the place of 
i and the symbols k return to their original sites. We have thus 
changed the parity of the arrangement an odd number of times 
and for this reason the arrangements (2) and 


is ae ne ene Ka ae (3) 
are of different parity. 
For n > 2, the number of even arrangements of n symbols is equal 
to the number of odd arrangements, i.e., S n! 
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Indeed, proceeding from the foregoing, order all arrangements 
of n symbols so that each one is obtained from the preceding one 
by a single transposition. Adjacent arrangements will have oppo- 
site parity, that is, the arrangements are ordered so that even and 
odd arrangements alternate. Our assertion now follows from the 
obvious remark that for n > 2 the number n! is even. 

Let us now define a new concept, that of a permutation of degree n. 
Write down two arrangements of n symbols, one under the other, 
and place parentheses around them; for example, for n = 5, 


He 


52341 (4) 
In this example,* 5 stands under 3, 2 under 5, etc. We say that 
number 3 goes into 5, 5 goes into 2, 1 goes into 3, and the number 
4 goes into 4 (or remains fixed) and, finally, 2 goes into 1. Thus, 
two arrangements written one under the other in the form shown in 
(4) define a certain one-to-one mapping of the set of the first five natural 
numbers onto itself, that is, a mapping in which each of the natu- 
ral numbers 1, 2, 3, 4, 5 is associated with one of these same natural 
numbers, distinct numbers corresponding to distinct numbers. And 
since there are only five numbers (a finite set), each one corresponds 
to one of the five numbers 1, 2, 3, 4, 5, namely, that one into which 
it “goes”. 

It is clear that the one-to-one mapping of the set of the first 
five natural numbers which we obtained by means of (4) could be 
obtained by writing certain other pairs of arrangements of five sym- 
bols one under the other. These are obtained from (4) by means of 
several transpositions of the columns, such as, for instance, 


21534 15243 oes ; 
i 325 Ar l 244 ae G 2345 19) 

In all these groups, 3 goes into 5, 5 into 2, etc. 
Similarly, two arrangements of n symbols written one under the 
other define a one-to-one mapping of the set of the first n natural 
numbers onto itself. Any one-to-one mapping A of the set of the 
first n natural numbers onto itself is termed a permutation of degree n. 


Obviously, any permutation A may be written with the help of two 
arrangements, written one under the other: 


ae ree i 
es ( (6) 
Hiss igs e 3 Qin 


* This array looks like a matrix of two rows and five columns, but its 
meaning is quite different. 
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Here, a; denotes the number into which i (i = 1, 2, ..., n) goes 
in the permutation A. 

The permutation A possesses many different notations of the 
form (6). For instance, (4) and (5) are different ways of denoting 
one and the same permutation of degree 5. 

It is possible to pass from one mode of notation of the permuta- 
tion A to another simply by performing a number of transpositions 
of the columns. It is then possible to obtain (6) in a mode such that 
the upper (or lower) row is any preassigned arrangement of n symbols. 
In particular, any permutation A of degree n may be written as 


1 2... n 
A= (7) 

Œi Qe a. Ori 
that is, with the numbers in the upper row arranged in their natural 
order. Given this notation, various permutations differ in the arran- 
gements of the lower row, and for this reason the number of permuta- 
tions of degree n is equal to the number of arrangements of n symbols, 


or nl. 
An instance of an nth-degree permutation is the identity permu- 


tation 
je eee 7) 
a5. 


in which all symbols remain fixed. 

It is well to point out that the upper and lower rows of the per- 
mutation A in notation (6) play different roles so that if interchanged 
the result would bea different permutation. Thus, the permutations 


of degree 4 
. 2143 4 4312 
be) a ere) 


are different: in the first, 2 goes into 4, in the second it goes into 3. 

' Let us take some permutation A of degree n in the arbitrary 
notation (6). The arrangements constituting the upper and lower 
rows in this mode can have either identical or opposite parities. 
As we know, we can proceed to any other mode of permutation A by 
means of successive transpositions in the upper row and correspond- 
ing transpositions in the lower row. However, by performing one 
transposition in the upper row of (6) and one transposition of the 
corresponding elements in the lower row, we simultaneously alter 
the parities of both arrangements and therefore preserve the coinci- 
dent or opposite nature of these parities. From this it follows that 
in all modes of notation of the permutation A, the parities of the upper 
and lower rows either coincide or are opposite. In the former case, A is 
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called even, in the latter, odd. In particular, the identity permuta- 
tion is even. . 

Ifthe permutation A is written as (7) (that is, with the even arran- 
gement 1, 2, ..., min the upper row), then the parity of permuta- 
tion A is determined by the parity of the arrangement a;, Qg, . . ., On 
of the lower row. Whence it follows that the number of even permuta- 


tions of degree n is equal to the number of odd permutations, that is. = nl. 


The definition of parity of a permutation may be cast in the follo- 
wing modified form. If, when written in mode (6), the parities of 
both rows coincide, then the number of inversions is either even 
in both rows or is odd in both, that is, the total number of inversions 
in both rows of (6) is even; but if the parities of the rows in mode 
(6) are opposite, then the total number of inversions in these two 
rows is odd. Thus, permutation A is even if the total number of inver- 
sions in the two rows in any mode of notation is even, it is odd otherwise. 


Example. Let there be given a permutation of degree 5: 
h 145 *) 
254314 


There are 4 inversions in the upper row, and 7 inversions in the lower row. 
The total number in the two rows is 11, and so the permutation is odd. 
Rewrite this permutation as 


rere? 


The number of inversions in the upper row is 0, in the lower, 5; that is, the 
total number is again odd. Though the modes of notation differ, ‘the permuta- 
tions preserve the parity of the total number of inversions, but not the actual 
number of them. 


We wish to indicate other ways, equivalent to those given above, 
of defining parities of permutations.* For this purpose we define 
multiplication of permutations, which is of great interest in itself. 
As we already know, a permutation of degree n is a one-to-one map- 
ping of the set of numbers 1, 2, ..., n onto itself. The result of a suc- 
cessive execution of two one-to-one mappings of the set 1, 2, ..., n 
onto itself will obviously again be a certain one-to-one mapping of 
the set onto itself, that is to say, a successive execution of two permu- 
tations of degree n leads to a certain very definite third permutation 
of degree n called the product of the first by the second. Thus, if we 
have the permutations of degree four, 


1234 [1234 
Aak Ba 


* This material may be omitted in a first reading since it will be required 
only in Chapter 44. 
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Ao 1234 

B (, 1 2 7 

In the permutation A, the symbol 1 goes into 3, but in B the symbol 

3 goes into 4, and so for AB the symbol 1 goes into 4, etc. 
Multiplication is only possible with permutations of the same 


degree. Multiplication of permutations of degree n for n > 3 is non- 
commutative. Indeed, using A and B, the product BA yields 


4234 
Cane (; 42 `) 
which shows that the permutation BA differs from the permutation 
AB. Such examples may be chosen for all n, n > 3, although for 
certain pairs of permutations, commutativity may accidentally be 
valid. 

The multiplication of permutations is associative; that is, we can 
speak of the product of any finite number of permutations of degree 
n taken (because of noncommutativity) in a definite order. Let there 
be given permutations A, B and C and let the symbol i, 1 <i, <n, 
go toi, in A, i, to i3 in B and to i, in the permutation C. Then in 
the permutation AB, i, goes to i3, in BC the symbol i, goes to i, 
and therefore the symbol i, goes to i, whether we perform (AB) C 
or A (BC). 

It is obvious that the product of any permutation A by the identity 
permutation E (and also the product of E by A) is equal to A: 


AE=EA=A 


then 


Let us now define the inverse of the permutation A as the permuta- 
tion A-7! of the same degree such that 


AA™ = ATA =E 


It is easy to see that the inverse of 
|e een 
A= 
Qi Qo ..- On 


Ay Ap... An 
-1 — 
a 2 n ) 
obtained from A by interchanging the upper and lower rows. 
Let us now examine permutations of a special kind which are 


obtained from the identity permutation Æ by means of a single 
transposition performed in the lower row. Such permutations are 


is the permutation 


3—5760 
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odd: they are termed transpositions and are of the form 


ee ae rn eee 

eee!) ®) 
where the dots stand for symbols that remain fixed. Let us agree 
to denote this transposition by the symbol (i, j). Application of the 
transposition of symbols i, j to the lower row of (7) of an arbitrary 
permutation A is equivalent to multiplying A on the right by the 
permutation (8), that is by (i, j). We know that all arrangements 
of n symbols may be obtained from one of them, say from 4, 2, 

..; n, by successive transpositions, and so any permutation may 
be obtained from the identity permutation by successive transposi- 
tions in the lower row, that is, by successive multiplication by per- 
mutations of the form (8). It can therefore be asserted (omitting 
the factor E) that any permutation can be represented as a product of 
transpositions. | 

Any permutation may be factored into a product of transposi- 
tions in many different ways. It is always possible, for example, to 
add two identical factors of the form (i, j) (i, j), which when mul- 
tiplied yield Æ, that is to say, cancel out. Let us take a somewhat 
less trivial instance: 


12345 
care 


This new way of defining the parity of a permutation is based 
on the following theorem. 

For all factorizations of a permutation into a product of transpo- 
sitions, the parity of the number of these transpositions is the same and 
coincides with the parity of the permutation. 

Thus, in the example given above, the permutation is odd, as 
may also be verified by counting the number of inversions. » 

This theorem will be proved if we demonstrate that the product 
of any k transpositions is a permutation whose parity coincides with 
the parity of the number k. For k = 1 this is true because a transpo- 
sition is an odd permutation. Let our assertion be proved for the 
case of k — 1 factors. Then its validity for k factors follows from 
the fact that the numbers k — 1 and k are of opposite parity and 
the multiplication of a permutation (in this case, the product of 
the first k — 4 factors) by a transposition is equivalent to this trans- 
position performed in the lower row of the permutation, which 
is to say, it changes the parity. 

Decomposition into cycles is a convenient way of writing permu- 
tations which makes it easy to find their parity. Any permutation 
of degree n can leave certain symbols 4, 2, ..., n fixed while 
moving others. A cyclic permutation (or, simply, a cycle) is a permu- 


) = 42 45) 69 = (19 @4 (45) 39 (13) 


3. ARRANGEMENTS AND PERMUTATIONS 35 


tation such that when it is repeated a sufficient number of times 
any one of the symbols can be transformed into any other symbol. 
Such, for instance, is the permutation of degree eight 


12345678 
Cee ee 


It transfers the symbols 2, 3, 6, and 8, with 2 going into 8, 8 into 
3, 3 into 6, and 6 again into 2. 

All transpositions belong to cycles. By analogy with the earlier 
used abbreviated notation for transpositions, the following notation 
is used for cycles: the symbols being transferred are enclosed in 
parentheses in the order in which they go into one another when the 
permutation is repeated; any transferable symbol can serve as the 
starting point, and the last one is that which goes into the first. 
Thus, for the example given above, this notation has the form 


(2 8 3 6) 


The number of symbols transferred by a cycle is called the cycle 
length. 

Two cycles of degree n are called disjoint if they do not have any 
common symbols subject to transfer. It is clear that in multiplica- 
tion of disjoint cycles, the order of the factors does not affect the 
result. 

Any permutation can be factored uniquely into a product of pair- 
wise disjoint cycles. The proof is simple and so we omit it. In actual 
practice, the factorization is accomplished in the following manner: 
begin with any one of the symbols subject to transfer, write out 
those symbols into which it goes in a new permutation until you 
arrive at the original symbol. After thus “closing” the cycle, begin 
with one of the remaining transferable symbols to obtain the second 
cycle, and so on. 


Examples. l 
aE 

= (43) (2 

AS (hakta 

e 

52876143 


(1) 
(2) ) = (156) (38) (47) 


Conversely, for any permutation specified by a decomposition into disjoint 
cycles, it is possible to find a notation in ordinary form, provided that the 
degree of the permutation is known. For example, 


ee) 
3175462 


if it is known that the permutation is of degree 7. 
Let there be given a permutation of degree n and let s be the number of 
disjoint cycles in its decomposition plus the number of symbols which it holds 


(3) (1372) (45) = ( 


3 
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fixed*. The difference n — s is called the decrement of this permutation. The 
decrement is obviously equal to the number of actually transferable symbols 
diminished by the number of disjoint cycles entering into the decomposition 
of the permutation. For Examples 1, 2, and 3 above, the decrement will be 
equal to 3, 4, and 4, respectively. 

The parity of a permutation coincides with the parity of the decrement of the 


permutation. 
Indeed, any cycle of length k may be represented in the following manner 


as the product of k — 1 transpositions: 
(is, ta, + "9 in) — (is, i2) (i4, iz) coe (is, in), 


Let us now suppose we have an expansion of permutation A into disjoint cyc- 
les. If each one of the cycles is factored by the indicated method into a pro- 
duct of transpositions, we get a representation of permutation A in the form 
of a product of transpositions. The number of these transpositions will obviously 
be less than the number of symbols actually transferable by 4 by a number 
equal to the number of disjoint cycles in the decomposition of the permutation. 
Whence it follows that the permutation A may be factored into a product 
of transpositions whose number is equal to the decrement, and for this reason 
the parity of the permutation is determined by the parity of the decrement. 


4, Determinants of nth Order 


We now wish to generalize the results obtained in Sec. 2 for n = 2 
and n = 3 to the case of an arbitrary n. For this purpose, we have 
to introduce determinants of order n. However, it is not possible 
to do that the way we introduced determinants of order two and 
three, that is by solving a system of linear equations in the general 
form: as n increased, the computations would become progressively 
more unwieldy, and totally unmanageable for arbitrary n. We choose 
a different approach. Considering the determinants of order two 
and three which we are already familiar with, let us attempt to 
establish a general law expressing these determinants in terms of 
the elements of the corresponding matrices, and then let us apply 
that law as a definition for an nth-order determinant. After that we 
will prove that Cramer’s rule holds true under such a definition. 

Recall the expressions for determinants of order two and three: 


Qil Ayo | = aiil — Aigai 
Qe, A22 
A144 Ayn Mg 
aiia { 
Ag, log Aag | = ByzAgollgs F AyoAg3h34 T A430 g4A25 


—~ €13@o0894 — Apolo {a — Ag¢8o38 
Q31 A39 233 13%99%31 12%91%33 {19339 


We see that any term of a second-order determinant is a product 
of two elements which lie in different rows and also in different co- 


* With every symbol which the permutation holds fixed it is possible 
to associate a “cycle” of length 4, i.e., say, in Example 2 above we could write: 
(156) (38) (47) (2). But we shall not do that. 
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lumns, and also that all products of this type that may be formed 
from the elements of a second-order matrix (two altogether) are 
utilized as terms of the determinant. Similarly, every term of a 
third-order determinant is a product of three elements, also taken 
one in each row and each column; again, all such products are utilized 
as terms of the determinant. 

Let us now take a square matrix of order n: 


Qi lsg .». . o Qin 
Qai Qoo a ee don 

(4) 
ani n2 «++ Ann 


We consider all possible products of the n elements of this matrix 
located in different rows and different columns, that is, products 
of the form 


Big, A202, +--+ Anan l i (2) 
where the subscripts a1, Œa, ..., Œn constitute an arrangement of 
the numbers 1, 2,..., n. The number of such products is equal 


to the number of different arrangements of n symbols, or n!. We con- 
sider all these products as terms of the future nth-order determinant 
associated with the matrix (1). 

To determine the sign affixed to product (2) in the determinant, 
note that, using the subscripts of this product, we can form the 


permutation 
& 2o ssa n ) (3) 
ay Qs e. o On 


where i goes into a; if an element in the ith row and a,th column 
of matrix (1) enters into the product (2). Examining expressions 
of determinants of second and third order, we note that the plus 
sign is affixed to the terms whose subscripts constitute an even 
permutation, and the minus sign to those terms with an odd permu- 
tation of subscripts. It is also natural to retain this regularity in the 
definition of a determinant of order n. 

We thus arrive at the following definition: the nth-order deter- 
minant associated with matrix (1) is the algebraic sum of n! terms 
which is constructed in the following fashion: the terms are all 
possible products of the n elements of the matrix taken one in each 
row and each column, the term having a plus sign if its subscripts 
form an even permutation, and a minus sign otherwise. 

For the notation of the nth-order determinant associated with 
matrix (1) we will, as in the case of determinants of order two and 
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three, use the symbol 


Aii Aig «+ > Ain 
Aoi Qag s.. o don (4) 
ani Ang +++ Ann 


Determinants of the nth order become determinants of order two 
and three, for n = 2 and n = 3; for n = 1, that is, for matrices 
consisting of a single element, the determinant is equal to that 
element. So far we do not know whether it is possible, for n > 3, 
to use the nth-order determinant for solving systems of linear equa- 
tions. That will be shown in Sec. 7. It will be necessary first to subject 
the nth-order determinants to a detailed study and, in particular, 
it will be necessary to find procedures for evaluating them, since 
to compute a determinant directly (via its definition), even for r 
not very large, would be extremely complicated. 

For the present let us establish some of the simpler properties 
of nth-order determinants that refer mainly to one of the two follow- 
ing problems: on the one hand, we are interested in the conditions 
under which a determinant is equal to zero, on the other, we will 
indicate certain matrix transformations which leave its determi- 
nant unchanged or result in readily perceivable alterations. 

The transpose operation with respect to matrix:(1) is a transfor- 
mation of the matrix in which its rows become columns with the 
same subscripts; in other words, it is a transition from matrix (1) 
to the matrix 


Qi, Aa ++ + Any 
Ajo Qoo ee oe ane 

a (9) 
Qin Qon . 6 > ann 


or we can say that a transposition is obtained by flipping matrix (4) 
over the principal diagonal. Accordingly, we say that the determinant 


Aii Goi >>. Any 
aig A9 s.. @ ang (6) 
Diy Oon baa Oy 


sobtained by taking the transpose of the determinant (4). 
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Property 1. Taking the transpose does not change the determinant. 
Indeed, every term of determinant (4) is of the form 


Bia; Azaz +++ Anan (7) 
where the second subscripts form an arrangement of the symbols 
1, 2, ..., n. However, all the factors of product (7) remain in 


different rows and different columns in determinant (6) as well; 
hence, (7) serves as a term of the transpose of the determinant too. 
The converse is also obviously true and for this reason the deter- 
minants (4) and (6) consist of the same terms. The sign of the term 
(7) in determinant (4) is determined by the parity of the permutation 


LUD rere ) 5 
@ Cg seie hy (8) 
In determinant (6) the first subscripts of the elements indicate the 


column, the second subscripts the row, and so term (7) in determi- 
nant (6) is associated with the permutation 


Qi Ba >.. Qn 
({ owe ht (9) 
In the general case, the permutations (8) and (9) are different but 
they obviously have the same parity and so term (7) has the same 
sign in both determinants. Thus the determinants (4) and (6) are 
sums of the same terms taken with the same signs, that is, they are 
equal. 

From Property 1 it follows that any assertion about rows holds 
true for the columns of a determinant and conversely; in other words, 
in contrast to a matrix, in a determinant the rows and columns are of 
equal status. We will therefore formulate and prove Properties 2 to 9 
only for the rows of a determinant; analogous properties for columns 
will not require special proof. 

Property 2. If one of the rows of a determinant consists of zeros, 
the determinant is zero. 

Indeed, let all the elements of the ith row of a determinant be 
zeros. Every term of the determinant must have, as a factor, one 
element of the ith row, and so in our case all the terms of the deter- 
minant are zero. 

Property 3. If a determinant is obtained from another one by 
interchanging two rows, then all terms of the first determinant wiil 
be terms of the second but with signs reversed; which means that inter- 
changing two rows of a determinant only changes the sign. 


Suppose, in determinant (4), the ith and jth rows (i = j) are 
interchanged and all other rows remain fixed. We get the deter- 
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minant 
7 A11 Aig ~~ + Ain 
l Aj, aja... Ajn (i) 
(10) 
Gii Qio Pous Qin (j) 
ani Anz +--+ Onn 


(row numbers indicated on the right). If 
Bia, Ga. » » -> Onan (11) 


is a term of (4), then all its factors in (10) as well obviously remain 
in different rows and different columns. Thus, determinants (4) 
and (10) consist of the same terms. Term (11) in determinant (4) is 
associated with the permutation 


E yer ie Se ert See 
( ) (42) 
CO Oa Ope et Op hace Qn 
and in determinant (10) with the permutation 
We 2 aa aa ES ae | 
(13) 
Qi Qg. eo Qjo jec. Qn 


since, for example, element Gig, NOW lies in the jth row but remains 


in the old &;th column. The permutation (13) however is obtained 
from (12) via a single transposition in the upper row; it thus has 
opposite parity. Whence it follows that all terms of determinant (4) 
enter into determinant (10) with opposite signs. Determinants (4) 
and (10) differ in sign alone. 

Property 4. A determinant containing two identical rows is equal 
to zero. 

Indeed, let a determinant be equal to the number d and let 
the corresponding elements of its ith and jth rows (i = j) be equal. 
By Property 3, after an interchange of these two rows, the determi- 
nant will be equal to the number —d. But since identical rows are 
interchanged, the determinant does not actually change; thus, d = 
= —d, whence d = 0. 

Property 5. If all the elements of some row of a determinant are 
multiples of some number k, then the determinant itself is a multip- 
le of k. 

Let all elements of the ith row be multiplied by k. Each term 
of the determinant contains exactly one element of the ith row, 
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therefore every term acquires the factor k, which means the deter- 
minant itself is a multiple of k. 

This property admits of the following formulation as well: 
a common factor of all elements of some row of a determinant may be 
factored out of the determinant. 

Property 6. A determinant with two proportional rows is equal 
to zero. 

Let the elements of the jth row of a determinant differ from the 
corresponding elements of the ith row (i = j) by one and the same 
factor k. Factoring this common factor k out of the jth row of the 
determinant, we obtain a determinant with two identical] rows, which 
by Property 4 is zero. 

Property 4 (and aiso Property 2 for n > 1) is abide a spe-. 
cial case of Property 6 (for k = 1 and k = 0). 

Property 7. If all the elements of the ith row of a determinant 
of order n are given as a sum of two terms: 


a;i; = bj + Cj, Pe AS eh ga 


then the determinant is equal to the sum of two determinants in which 
all rows (except the ith) are the same as in the given determinant and 
the ith row in one of the summands consists of the elements b; and in 
the other, of the elements c;. 

Indeed, any term of the given determinant may be Penpeneted 
in the form 


Qigy@2a2 eee Via; eee Ona, = Aigla ee (ba; -+ a oe 


= liala eee ba. . . Anaa -} Q1q,22a9 i ae Ca; aoe ana, 


t 


Collecting together the first summands of these sums (with the same 
signs as the corresponding terms had in the given determinant) 
we evidently obtain an nth-order determinant which differs from 
the given determinant solely in the fact that the ith row has ele- 
ments b; in place of elements a,;. Accordingly, the second summands 
form a determinant in the ith row of which are the elements c;. Thus 


aii OTe see Qin Ay Aig - ++ Ain Qy, Ay +--+ Ain 
bitci ba +c ba+en = by be bn -+ Cy Co Cn 
ni f ang, eae Ann ani ano eee ünn äni an2 eee ünn 


Property 7 is readily extended to the case when any element of 
the ith row is a sum of m summands, not two, m > 2. 

We shall say that the ith row of a determinant is a linear combi- 
nation of the remaining rows if for every row with subscript j, 
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j=4,..., PSA TSS . n, there exists a number k; such 
that when the jth row is multiplied by k; and then all the rows 
except the ith are added together (addition of rows is to be under- 
stood in the sense that the elements of the row are added in each 
column separately), we obtain the ith row. Some of the coefficients 
k; may be zero, that is the ith row will actually be a linear combina- 
tion not of all but only of a few of the remaining rows. In particu- 
lar, if only one of the coefficients k; is different from zero, we get the 
case of proportionality of two rows. Finally, if the row consists 
entirely of zeros, it will always be a linear combination of the 
remaining rows—the case when all ky, are zero. 

Property 8. If one of the rows of a determinant is a linear combi- 
nation of the other rows, then the determinant is zero. 

For example, let the ith row be a linear combination of s other 
rows, 1 Ks <n — 1. Then every element of the ith row will be 
a sum of s summands, and for this reason, using Property 7, we can 
represent our determinant in the form of a sum of determinants in 
each of which the ith row will be proportional to one of the other 
rows. By Property 6, all these determinants are zero; hence the 
given determinant is zero as well. 

This property is a generalization of Property 6 and, as will be 
proved in Sec. 10, it provides the most general case of a zero deter- 
minant. 

Property 9. A determinant remains unchanged if to the elements 
of one of its rows we add corresponding elements of another row mul- 
tiplied by the same number. 

Suppose to the ith row of determinant d we add the jth row, 
j= i, multiplied by the number k; that is, in the new determinant 
every element of the ith row will be of the form a;, + ka;,, s = 
= 1, 2,..., n. Then, by Property 7, this determinant is equal 
to the sum of two determinants, the first of which is d and the second 
of which contains two proportional rows and is therefore zero. 

Since the number k may also be negative, the determinant does 
not change even if we subtract from one of its rows a tow multiplied 
by some number. Generally, a determinant remains unchanged if to 
one of its rows we add any linear combination of the other rows. 


Let us consider an example. A determinant is called skew-symmetric if the 
elements symmetric about the principal diagonal differ in sign alone, that 


is, if for all i and j it is true that aj; = — aij, whence it follows that for all 
i it is true that ai; = — aii = 0. Thus, the determinant is of the form 
0 a42 i3 «+ « + Ain 
— a2 0 a23 + + + Ag, 
d = |—ay —az O ... ay 
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Multiplying each row of this determinant by —1, we obtain the transpose of the 
determinant, which is again equal to d, whence, by Property 5, it follows that 
(—1)"d=d 


It then follows, for odd n, that —d= d, or d= 0. Thus any skew-symmetric 
determinant of odd order is equal to zero. 


5. Minors and Their Cofactors 


We have already pointed out that it would be difficult to com- 
pute an nth-order determinant by applying the definition directly, 
that is every time writing out all n! terms, determining their signs, 
etc. There are simpler methods for evaluating determinants. They 
are based on the fact that a determinant of order n may be expressed 
in terms of a determinant of lower order. For this purpose we intro- 
duce the following notion. 

Let there be a determinant d of order n. Take an integer k which 
satisfies the condition 1 < k < n — 1, and in the determinant d 
choose arbitrary k rows and k columns. The elements which lie at 
the intersection of these rows and columns, that is, which belong 
to one of the chosen rows and to one of the chosen columns will 
obviously form a matrix of order k. The determinant of this matrix 
is called a minor of order k of the determinant d. We can also say 
that the Ath-order minor is a determinant obtained by striking out 
n — k rows and n — k columns in d. In particular, after striking 
out one row and one column in the determinant we obtain a minor 
of (n — 1)th order; on the other hand, separate elements of deter- 
minant d will be minors of the first order. 

Let us take a minor M of order k in a determinant d of order n. 
If we strike out the rows and columns at the intersection of which 
this minor stands, we obtain the minor M’ of order (n — k) which 
is called the complementary minor of the minor M. If, on the con- 
trary, we strike out the rows and columns which contain elements of 
the minor M’, then what remains is obviously minor M. Thus, we 
can speak of a pair of complementary minors of the determinant. In 
particular, the element a;; and the minor of order (n — 1) obtained 
by striking out the ith row and the jth column in the determinant 
will form a pair of complementary minors. 

If a kAth-order minor M is located in rows with the position num- 
bers (indices) i4, is, ..., i, and in columns with the position num- 
bers ji, jas ---, Jn, then we use the term cofactor of the minor M 
for the supplementary minor M’ taken with a plus or minus sign 
according as the sum of the position numbers of all rows and columns 
in which M is located is even or odd, that is, the sum 


sm Si tist... bath tht... tja (1) 
In other words, the cofactor of M is the number (—1) M". 


44 CH. 1. SYSTEMS OF LINEAR EQUATIONS. DETERMINANTS 


The product of any minor M of order k by its cofactor in a determi- 
nant d is the algebraic sum, whose summands, which are obtained by 
multiplying the terms of the minor M by the terms of the supplementary 


minor M’ taken with the sign (—1)°™, are certain terms of the determi- 
nant d; their signs in this sum coincide with the signs they have in 
the determinant. 

We begin the proof of this theorem with the case when the minor 
M is located in the upper left corner of the determinant: 


UZEI eee Aik ti, k+4 eee Qin 

We earal tote 8, te hoe Tease 
lka sis ARR lhk, k+4 +++ lkn 

d= See ee 

Akti, 4 Qr+4,k | Akti, h+4 +++ Cat, n 

M” 
Ani aoe nk an, R+4 eee Ann 
that is, in rows with position numbers 1, 2, ..., k and in columns 


with the same position numbers. Then the minor M’ will occupy 
the lower right corner of the determinant. The number sy will then 
be even: 


su=142+ ... +k+1+2+ ... +k =2(144+24+ ... +k) 


therefore, the minor M’ itself will serve as the cofactor of M. 
Take an arbitrary term 


Bia, Fa, °° takap (2) 


of minor M; its sign in M is (—1)' if l is the number of 
inversions in the permutation 


1 2... k 
3 
ie eee A (3) 

In this minor, the arbitrary term 
Akti, Bhp ?Rt2, Bays tt OnBn | (4) 


of minor M’ has the sign (— 1)" where l’ is the number of 
inversions in the permutation 


G k+2...0 


E ane Ba (°) 


Multiplying the terms (2) and (4), we obtain a product of n 
elements 


Qia lag +++ Aka ahdi, Bp, ,TR+2, By,o ++ > InB, (6) 


located in different rows and different columns of the determinant. 
It is therefore a term of determinant d. The sign of term (6) in the 
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product MM’ is a product of the signs of terms (2) and (4), i.e., 
(—1)'-(—1)"" = (—1)". However, term (6) has the same sign in 
the determinant d as well. Indeed, the lower row of the permutation 


nae ee) 
Qi Gy... Op Prya Bato -ee Br 


made up of the subscripts of this term contains only J + I’ inver- 
sions, since no & can form an inversion with any one of the f; all a 
do not exceed k, all B are not less than k + 1. 

This proves the particular case of the theorem that we have 
considered. Let us now take up the general case. Suppose that the 
minor M lies in the rows with position numbers i,, is, ..., i, and 
in the columns with position numbers ji, jz, ..., jx, with the 
condition that 


bgp Lia Kee Kirn Nh... < jr 


Let us attempt, by interchanging rows and columns of the determi- 
nant, to move the minor M to the upper left corner and let us try 
to do this so that the complementary minor is not changed. For 
this purpose, interchange the ith row with the (i, — 1)th, then 
with the (i, — 2)th and so on until the i,th row occupies the first 
row; this requires interchanging the rows i, — 1 times. Then we 
successively interchange the ith row with rows located above it 
until it lies directly under the i,th row (that is, in the position of 
the original second row); this, as can readily be verified, will require 
interchanging the rows i, — 2 times. Similarly, we move the ith 
row to the third row, and so on, until the i,th row takes up the 
position of the kth row. In all, we will have to perform 


Cote A aie ON Se ae at | 
i =( tit... +i) —A+2+...+84) 


transpositions of rows. 

The minor M is thus located in the first k rows of the new deter- 
minant. We will now successively interchange the columns of 
the determinant, the j,th column with all preceding ones, until it 
occupies first place, then the j,th column until it occupies second 
place, and so forth. In all, the columns will be interchanged 


Gi tjie t... th) - A +2+... +h) 
times. 

- All these transformations lead us to a new determinant d’ in which 
the minor M occupies the upper left corner. Since each time we 
interchanged only adjacent rows or columns, the mutual positions of 
the rows and columns containing the minor M’ in the determinant d 
remain without change, and so the minor M’ remains complementary 
to the minor M in the determinant d’; however, it now occupies the 
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lower right corner. As was proved above, the product MM’ is the 
sum of some number of terms of the determinant d’ taken with the 
same signs as they had in d’. However, the determinant d’ is obtained 
from the determinant d by means of 


(cttw Se eh As ce 
+a tjt. th) —A4+24+...+ 9) 
Bg E Ot es ep) 


transpositions of rows and columns, and so, as we know from Sec. 4, 
the- terms of determinant d’ differ from the corresponding terms of 


determinant d in sign alone, (—1)°™ [naturally, the even number 
24+24 ... + k) will not affect the sign]. From this it follows 
that the product (—1)°”“ MM’ consists of a certain number of terms 
of the determinant d taken with the same signs as they have in that 
determinant. The theorem is proved. 

Note that if the minors M and M’ are complementary, then 
the numbers sy and sm have the same parity. Indeed, the position 
number of any row and any column enters as a summand in one and 
only one of these numbers, and therefore the sum sm + sy is equal 
to the total sum of the position numbers of all rows and columns of 
the determinant, i.e., it is equal to the even number 2 (1 + 2 + 


+... +7). 
6. Evaluating Determinants 


The results of the preceding section enable us to reduce computing 
an nth-order determinant to. the computation of several determi- 
nants of order (n — 1). Let us first introduce notation: if a;; is an 
element of determinant d, then M;; denotes the complementary minor, 
or, simply, the minor of that element, that is, the minor of order 
(n — 1) obtained by striking out the ith row and the jth column of 
the determinant. A,; will denote the cofactor of the element a;;; thus, 


Ai = (AYM 

As was proved in the preceding section, the product a;jA;; is 
the sum of several terms of the determinant d which enter into this 
sum with the same signs as they have in the determinant d. It is 
easy to count these terms: the number is equal to the number of 
terms in the minor M;;, or (n — 1)!. 

Let us now choose any ith row of the determinant d and take 
the product of each element of the row- by its cofactor: | 


QiÁit QiaAio o +1 AinAin (1) 
No term of the determinant d can be in two different products of 


those given in (4): all the terms of the determinant which enter 
into the product a@;,A;, contain the element a;, of the ith row and 
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for this reason differ from the terms which enter into the product 
@;2Aj;o, that is, those which contain the element a;, of the ith row, 
and so on. 

On the other hand, the total number of terms of determinant d 
which appear in all the products of (1) is equal to 


(n — 1)!-n =n! 


Generally, this exhausts all the terms of the determinant d. We have 
thus proved that there is an expansion of the determinant d in terms 
of the ith row: 


d= tsis + QieA jg +... + alinAin (2) 
The determinant d is thus equal to the sum a of the products of all the 
elements of an arbitrary row by their cofactors. A similar expansion 
of the determinant can also be obtained about any column. 

By replacing the cofactors in the expansion (2) by corresponding 
minors with a plus or a minus sign, we reduce computation of an 
nth-order determinant to the computation of several determinants 
of order (n — 1). Note that if some of the elements of the ith row 
are zero, then naturally the corresponding minors need not be 
evaluated. It is therefore useful, first, to transform the determinant, 
using Property 9 (see Sec. 4), so that a large enough number of 
elements in one of the rows or in one of the columns are replaced 
by zeros. Actually, Property 9 enables us to replace all elemenis, 
except one, by zeros in any row or any column. Indeed, if aip = 0, 
then any element a;;, jk, of the ith row will be ee by 


a zero after subtracting the kth column multiplied by = a l from 


the jth column. Thus, evaluating a determinant of the nth order 
may be reduced to computing a single determinant of order (n — 1). 


Example 1. Evaluate the fourth-order determinant 


3 4-1 2 
Bess —5 4 3-4 
2 0 14-14 
4-5 3—3 
Expand it about the third row by using the zero in that row: 
“4-4 2 
d = (—1)3+1.2-) 4 3 —4 
—5 3—3 . 
3 1 2|. 3. 1 —i 
+ (—1)3+3.4-)-5 4 —4 | + (—1)8+4-(—1)-} —5 1 3 
4 —5 —3 i 1—5 3 


Evaluating the third-order determinants thus obtained, we get 
d = 2 . 146 — 40 + 48 = 40 
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Example 2. Evaluate the fifth-order determinant 
—2 5 0-1 3 
t 0 3 7—2 
d= 3—1 0 5—5 
2 6-4 1 2 
0-3-1 2 3 
Adding three times the fifth row to the second and subtracting four times the 
fifth row from the fourth row, we get 
—2 5 0—i 3 
1—9 0 43 7 
d = 3—1 0 5 —5 
2 18 0 —7 —10 
0-3-1 2 3 


Expanding this determinant in terms of the third column, which contains only 
one nonzero element (with the sum of subscripts, 5 + 3, being even), we get 


—2 5—1 3 
1—9 43 7 
BAD ea Re ie 
2 18 —7 —40 


We now transform this determinant by adding two times the second row to 
the first row and subtracting three times the second from the third row, and two 
times the second from the fourth: 

0—13 25 17 

E 1 —9 43 7 
E 0 26 —34 —26 
0 36 —33 —24 


and then expand it in terms of the first column. Noting that the only nonzero 
element of this column is associated with an odd sum of subscripts, we get 


—13 25 17 
d= 26 —34 —26 
36 —33 —24 


Let us compute this third-order determinant after expanding it in terms of the 
third row: 
25 17 —13 17 —13 25 
d= 36: — (—383)- —24)> 
—34 E (=a) | 26 —26 [+ ae) | 26 ae 


= 36-(—72) — (—33)-(—104) + (—24)-(—208) = —1032 


Example 3. If all the elements of a determinant located on one side of the 
principal diagonal are equal to zero, then the determinant is equal to the product 
of the elements on the principal diagonal. 

This assertion is obvious for a second-order determinant. We therefore 
prove it by induction, that is, we assume that for determinants of order (n — 1) 
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it has been proved, and then we consider the nth-order determinant 
. Ay, Q12 Q13 + + + Ain 
0 a22 A23 e. ' Qn, 
d = 0 0 a33 >.. Aan 
0 0 0... am 
Expanding it in terms of the first column, we get 
A22 dog...» don 


0. A33 - +.» G3, 
d= ayy? 


OO uw 
But the induction hypothesis is applicable to the minor on the right-hand side: 
it is equal to the product azeass ... ann and so 


d= 444492833 «+ + Ann 
Example 4. The Vandermonde determinant is the determinant 


E He E uie A 


=| a? 2 a 2 
d=| af az a a? 
n-1 gN-1 gn-ı n-1 

Gy az a3 os an 


We shall prove that for any n the Vandermonde determinant is equal to the 
product of all possible differences aj — aj, where 1 Lj < i< n. Indeed, for 
n= 2 we have re ent 


|= a — ay 


ay ag 


Suppose our assertion has already been proved for Vandermonde determinants 
of order (n — 1). We transform determinant d as follows: subtract from the 
nth (last) row the (n — 1)th row multiplied by ai, then from the (n — 1)th 
row subtract the (n — 2)th also multiplied by a,, etc. Finally, from the second 
row subtract the first multiplied by ap We obtain 


4 1 1 ih 4 
10  a—a @g—@, «ws Og ey 
d=|0 ai—ayja, ai—ayag ... =a — tln 
0 apt — ajag? ag t — aag? ... an an? 


Expanding this determinant in terms of the first column, we arrive at a deter- 
minant of order (n — 1); after factoring out common factors from all columns, 


4—5760 
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it will take the form 


ao a3 ayn 
d= (a@,—a@4) (a3— 44) ... (@n—a@4)+| a3 ah ... a 
daR- af-? .., af-3 


The last factor is the Vandermonde determinant of order (n — 1), that is, by 
hypothesis, it is equal to the product of all the differences a, — a; for 2 <j < 
<i<n. Using the symbol II to denote a product, we can write 
d= (ag— a4) (a3— 44) ... (An — 4) II - (aj—aj)= [| (a; —ayj) 
2<j<isn 1<j<i<n 


Using the same method, we can prove that the determinant 


N- N-11 gN- TN 
al) af! aRt .., ant 
d'=| a? az ag az, 
ay ao ag eee an 

f AL! age 


is equal to the product of all possible differences a; — aj, where 1 Ci<j <n, 
that is, 


d'= [I (a; — aj) 
. i<i<j<n 

Generalizing the above-obtained expansions of a determinant 
about a row or a column, we prove the following theorem which 
has to do with the expansion of a determinant in terms of several rows 
„or columns. _ d : NETR eh 

- Laplace’s theorem. Let. there be arbitrarily chosen, in a deter- 

minant d of order n, k rows (or k columns), 1 < k <n — 1. Then 
the sum of the products of all Ath-order minors contained in the 
chosen rows by their cofactors is equal to the determinant d. 

Proof. Suppose, in determinant d, we choose rows with position 
numbers i;, ig, ..., ix. We know that the product of any minor M of 
order k located in these rows by its cofactor consists of a certain 
number of terms of the determinant d taken with the signs they 
have in the determinant. The theorem will consequently be proved 
if we demonstrate that by making M run through all kth-order 
minors located in the chosen rows we obtain all the terms of the 
determinant, none being repeated. 

Let 


Bia42a2 +++ Anan (3) 
be an arbitrary term of the determinant d. We separately take 


the product of those elements of the term which belong to the rows 
we have chosen with position numbers i;, iz, ..., iz. This is the 
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product 


The k factors of this product lie in k distinct columns, namely, 
in the columns with position numbers @;,, @izs ..., Œi. These 
position numbers of the columns are consequently determined by 
specifying the term (3). If by M we denote the Ath-order minor 
lying at the intersection of the columns with these position numbers 
Qir Gigy + + +» Qip and of the earlier chosen rows with the position 
numbers ii, ig, ..., iz, then the product (4) is one of the terms 
of the minor M, and the product of all the elements of the term (3) 
not in (4) is a term of its complementary minor. Thus, any term 
of the determinant enters into the product of a certain (quite definite) 
minor of order k made up of the chosen rows multiplied by its comple- 
mentary minor, and is a product of quite definite terms of these 
two minors. Finally, in order to obtain the term that we took of 
the determinant with the sign which it has in the determinant, 
it remains, as we know, to replace the complementary minor by the 
cofactor. This. completes the proof of the theorem. i 

‘It is possible to give a slightly different proof, namely, 
the product of any kth-order minor M located in the chosen rows 
by its cofactor consists of k! (n — k)! terms, since the kth-order 
minor M consists of k! terms and its cofactor, differing: possibly 
from the minor of order n — k in sign alone, contains (n — k)! 
terms. On the other hand, the number of Ath-order minors contained 
in the chosen rows is equal to the number of combinations of n taken 
k at a time, that is, it is equal to the number 


n! 
Wabi 


Multiplying out, we find that the sum of the products of all 
kth-order minors of the chosen rows by their cofactors consists 
of n! summands. Such, however, is the total number of terms of the 
determinant d. The theorem will thus be proved if we demonstrate 
that any term of the determinant d appears at least once (and, 
in that case, exactly once) in the sum at hand of the products 
of the minors by their cofactors. It is left to the reader to repeat 
(with slight simplifications) the reasoning given in the first proof. 

The Laplace theorem enables one to reduce the computation 
of an nth-order determinant to the computation of several deter- 
minants of orders k and n — k. Generally speaking, there are very 
large number of such new determinants and so it is advisable to 
apply the Laplace theorem only when it is possible to choose k rows 
(or columns) in the determinant so that many of the kth-order 
minors located in these rows are zero. 
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Example 1. Suppose we have a determinant, all elements of which in the 
first k rows and the last n — k columns are zero: 


ay se Aik 
Pn ar N ee 
d= akhi «++ -Akk ©; 
Gri, 1+.. hei, k îk+i, k+1 +--+ Thti,n 
ni ».». ank Qn, kti «++ ünn 


This determinant is then equal to the product of two of its minors: 


ait «+. Gtk: Akti, kti ++» Aktin 


hki ».. Okk an, k+4i «++ nn 


To prove this, it suffices to expand the determinant about the first k rows. 
Example 2. Suppose we have a determinant d of order 2n, in the upper 
left corner of which is an nth-order minor composed entirely of zeros. If the 
nth-order minors lying in the upper right and lower left and lower right corners 
of the determinant are denoted, respectively, by M, M” and M”, so that 


the determinant may be written symbolically as d = |, then d = 


= (—1)"MM’. 
To prove this, expand the determinant in terms of the frst n rows and 
note that 


sy = (042+... +n) + [n+ 1) nt 2) +... + Qn] = n+ 2n 


that is, są and n are of the -same parity. 
Example 3. Evaluate the determinant 


—4 12-2 1 
0 30 1—5 
d= 2 —3 1—3 1 
—1-13-1 0 
(0 40 2 5 


Expanding it about the first and third columns which contain nicely located 
zeros, we get 


M’ M” 


42 3 1—5 
d= (—1)2+3 +148 ral a4. a4 0 
4 2 5 
42 3 1—5 
EARR al —3 —3 1 
4 2 5 
24 1—2 1 
T (—1)3 444143 E 3 13 4 —5 
14 2 5 


= (—8)+(—20) — (—10) -(—62) — 7-87 = —1069 
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7. Cramer’s Rule l 


‘The foregoing theory of determinants of order n ‘allows us to: ae 
that these determinants, which were introduced only by analogy 
with second- and third-order determinants, may, like the latter, 
be used to solve systems of linear equations. Let us first make one 
additional remark regarding expansions of determinants in terms 
of a row or a column; this remark will often come in handy in the 
sequel. 

Expand the determinant 


Qia a1; in 

a Qo a 

2i 2j on 
d= 

ani b a anj ee 6 ann 


about the jth column: 
d = A4;A4; -+ QyjAo; -+ Kaye QnjAnj 


Then, in this expansion, replace the elements of the jth column by 
a set of n arbitrary numbers bi, ba, ..., bn. The expression 


bilge DA es sao bs 


which you obtain will obviously serve as an expansion about the 
jth column for the determinant 


aii by Bin 
a b a 
21 2 an 
ad’ = 
Ani >œ bn aes Ann l 


which is obtained from the determinant d by replacing its jth column 
by a column of the numbers bı, bz, ..., bn. Indeed, replacing the 
jth column of d does not affect the minors of the elements of the 
column, and for this reason does not affect their cofactors. 

Let us apply this to the case when for the numbers b4, ba, > . -, On 
we take elements. of the kth column of the determinant d when 
k= j. The determinant resulting from such a replacement will 
contain two identical columns (jth and kth) and therefore will be 
zero. Hence, the expansion of this determinant about its jth column 
will also be zero, that is 


ay,Ary + AypnAo; -+ ... + An prAnj = 0 for j Æ k 


=; Thus, the sum of the products of all elements of a certain column 
of a determinant by the cofactors of the corresponding elements of 
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another column is zero. The same result of course holds true. for the 
rows of a determinant. . 

Let us now examine systems of linear equations; we will confine 
ourselves for the time being to systems in which the number of equations 
is equal to the number of unknowns, i.e., systems of the form 


ati F agta +... + Ant, = Oy, 


(1) 
aniti + Angle +... F annin = bn 

We also assume that the determinant d made up of the coeffi- 
cients of the unknowns of the system (called, for short, the deter- 
minant of the system) is nonzero. Given these assumptions, we will 
prove that the system (41) is consistent and even determinate. 

In Sec. 2, when we solved a system of three equations in three 
unknowns, we multiplied each of the equations by a factor, and 
then added the equations; the coefficients of two of the unknowns 
proved to be zero. We now see immediately that the factors which 
we used were cofactors, in the determinant of the system, of the 
element which was the coefficient of the desired unknown in the 
given equation. We now use this device to solve system (1). 


First suppose that system (1) is consistent and a1, @, ..., &n 
is one of its solutions. Hence, the following equations hold true: 
By, F Aye +... + aman = bı, 


1h -+ Aag + .. e +- AgnAn = be, 
$ . . . (2) 

ama + Anz +... F annan = bnr 
Let j be any one of the numbers 1, 2, ..., n. Multiply both sides 
of the first equation of (2) by Aıj, that is, by the cofactor of the 
element a,; in the determinant d of the system. Multiply both 
sides of the second equation by A,;, and so on. Finally, multiply 
both sides of the last equation by A,;. Adding together separately 


the left and right sides of all equations, we arrive at the following 
equation: 


(anA Fansa +... + ann) a 
+ (A124 45 + an ae w+ + F GngAnj) Q2 


ee ee i Ce] 


. . eo G 


+ (amA + On Ag; +. ak a Anj) an 
= dA + b 9A oj +. e + bnn 
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The coefficient of a; in this equation is d, the coefficients of 
all other œ will, due to the remark made above, be zero, and the 
constant term will be the determinant obtained from the determinant 
d after replacing the jth column in it by a column of the constant 
terms of system (1). If, as in Sec. 2, we denote this latter determinant 
by d;, then our equation takes the form 


da; = dj 
whence, because d 0, 
die : 7 
a; =- 


This proves that if system (1) is consistent, then it possesses the 
unique solution 


d d dn ` 
amh a ey oe @ 


We will now show that the set (3) of numbers actually satis- 
fies system (1) of equations, that is, that (4) is consistent. We will 
make use of the following commonly employe symbolism. 

Any sum of the form a, +a,+...-+ a, will be denoted 


briefly by 3 a;. But if we consider a sum whose terms a;; are labelled 


with two subscripts; and i=1, 2, ee Oat ete ee .» Mm, 
then we can first take the sums of the pane with fixed first 


subscript, that is, the sums Dau, where i = 1, 2,..., n, and 


then add all the sums. We then obtain the A notation for 
the sum of all elements Qy;: 


However, we could first add the summands a;; with fixed second 
subscript and then combine the resulting sums. Thus 


n 
Dp $ lij = 3 2 Qij 
i=i £ i i=1 


i.e., in a double sum the ae of pene may be reversed. 
Now put the values of the unknowns (3) into the ith equation 
of system (1). Since the left side of the ith equation may be written 


as Saye and since dj = > bpA,3, we get 


n 5 n > n. l n 
D ati J ay (Sonny) =F > br ( D) aijn) 
j k==1 j=1 


j=1 j=1 k=1 
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With regard to these manipulations, note that the number $ turned 


out to be- a. common factor in all summands and was therefore taken 
outside the summation sign; besides, after changing the order of 
summation, the factor b, was factored out of the inner sum since 
it is not dependent on the subscript j of the inner summation. 


We know that the expression x Q;j;Anj = QaAp + liA +. 


ae QinAnn will be equal to a for k = i and to Q for all other 
k's. Thus, in our outer sum with respect to k there will be only 
one summand left, namely, b;d; i.e., 


3 
This is proof that the set (3) of numbers is indeed a solution to the 
system (1) of equations. 

We have obtained the following important result. 

A system of n linear equations in n unknowns, the determinant 
of which is nonzero, has a unique solution. This solution is obtained 
from formulas (3), that is by means of Cramer’s rule. The formula- 
tion of this rule is the same as in the case of a system of two equa- 
tions (see Sec. 2). 


Example. Solve the system of linear equations 
2a, + z2 — 5r + au = 8, 
Ly — 322 — ôr, = 9, 
222 — z3 + 24, = —5, 
zı + 4z, — Tz; + 6r, = O0 
The determinant of the system is different from zero: 


24-5 1 
1—3 0—6 
a=jo 2-4 2/77 
1 4-7 6 


and so Cramer’s rule is applicable. The values of the unknowns will have as 
numerators the determinants 


8 1—5 1]. 2 8—5 1 | 
i 9—3 0 —6 4 9 0—6 
= —4 = ie = — 0 + 
d —5 2—1 2 81, d 0 —5 —1 2 re 
0 4—7 6 1 0—7 6 
2 4 8 4 z = 72 4-5 8 
4-8 9—6 1—3 0 9 
= = —2 , d = = 
a E E 2 í t jo 2—4 —5 a 


1 4 0 6 4 4-7 0 
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Thus, ; 
zt = 3, tz = —4, z; = —1, y= 1 


will be the unique solution set of our system. 


We did not consider the case when the determinant of a system 
of n linear equations in m unknowns (1) is zero. It will be discussed 
in Chapter 2, where it will find its place in the general theory of 
systems involving any number of equations in any number of 
unknowns. 

One more remark is in order with respect to systems of n linear 
equations in n unknowns. Given a system of n homogeneous linear 
equations in n unknowns (see Sec. 4): 


yti + AyoXg +- .. e + lintr == 0, 


| (4) 
anizi + anoz + 2. + Anntn = 0 

In this case, all determinants d;, j = 1, 2, ..., n, contain 

a column made up of zeros and are therefore equal to zero. Thus, 

if the determinant of system (4) is nonzero, that is if Cramer’s rule 

is applicable, then the only solution of system (4) will be the trivial 


solution | 
a= 0, zz=0, .. 0, = 0 (5) 


Whence follows the result: 

If a system of n homogeneous linear equations in n unknowns has 
nontrivial solutions, then the determinant of the system is necessarily 
zero. 
In Sec. 12 it will also be shown that, conversely, if the determi- 
nant of such a system is indeed equal to zero, then the system will 
have solutions other than the trivial solution, the existence of 
which is obvious for every system of homogeneous equations. 


Example. For what values of k can the system of equations 
kzı + z= 0, 
zı + kz = 0 


have nontrivial solutions? 
The determinant of this system 


ki 
= kh? — 4 
t | 
will be zero only when k = + 1. It is easy to see that for each one of these 
two values of k the given system will indeed have nontrivial solutions. 


The significance of Cramer’s rule lies mainly in the fact that 
for cases when it is applicable it offers an explicit expression of 
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the solution of the system in terms of the coefficients of the system. 
However, Cramer’s rule involves very unwieldy computations; in 
the case of a system of n linear equations in n unknowns, one has 
to compute n -+ 1 determinants of the nth order. The method of 
successive elimination of unknowns given in Sec. 1 is much more 
convenient in this respect since the computations involved here 
are actually equivalent to those required in the evaluation of a single 
determinant of the nth. order. 

In applications, we often encounter systems of linear equations 
whose coefficients and constant terms are real numbers obtained 
in measurements of physical quantities and as such are known only 
approximately, to within a specified accuracy. The foregoing methods 
are then sometimes rather inconvenient because they lead to results 
with poor accuracy. A variety of iterative procedures have taken 
their place. These are methods which yield solutions of systems 
of equations via successive approximations of the unknowns. The 
interested reader will find such methods described in texts dealing 
with the theory of approximate calculations. 


> CHAPTER 2 


_ SYSTEMS OF LINEAR 
EQUATIONS 
(GENERAL THEORY) 


8. m-Dimensional Vector Space 


To construct a general theory of systems of linear equations 
we will need more than the apparatus that sufficed with such success 
in the solution of systems to which Cramer’s rule was applicable. 
Besides determinants and matrices we will need a new concept, 
which, perhaps, is of still greater general mathematical interest — 
that of multidimensional vector spaces. 

First a few preliminary remarks. From the course of analytic 

geometry we know that any point in a plane is determined (for 
specified coordinate axes) by its two coordinates, which is to say, 
by an ordered set of two real numbers. Any vector in a plane is 
determined by its two components, which again is an ordered set 
of two real numbers. Similarly, a point in three-dimensional space 
is determined by three coordinates, a vector in space, by three 
components. 
'— In geometry and also in mechanics and physics we often encoun- 
ter objects whose specification requires more than three real numbers. 
For instance, let us consider a collection of spheres in three-dimen- 
sional space. To specify a sphere completely we need the coordinates 
of its centre and the radius; this amounts to an ordered set of four 
real numbers, of which, incidentally, the radius can only assume 
positive values. On the other hand, let us consider various positions 
of a solid in space. The position of a solid will be fully defined if 
we indicate the coordinates of its centre of gravity (this requires 
three real numbers), the direction of some fixed axis passing through 
the centre of gravity (two numbers—two out of three direction 
cosines), and, finally, the angle of rotation about this axis. Thus, 
the position of a solid body in space is determined by an ordered 
set of six real numbers. 

These examples suggest considering collections of all possible 
ordered sets of m real numbers. After introducing the operations 
of addition and multiplication by a scalar (this will be done later 
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on by analogy with appropriate operations involving vectors in 
three-dimensional space expressed in terms of components), we call 
this collection an n-dimensional vector space. Thus, n-dimensional 
space is only an algebraic structure which retains certain of the 
simplest properties of collections of vectors of three-dimensional 
space emanating from a coordinate origin. 

An ordered set of n numbers (an ordered n-tuple) 


a = (a4, Gy, ~~ +» An) . (1) 
is called an n-dimensional vector. The numbers a;,i = 1,2, ..., M, 
will be called the components of the vector a. The vectors a and 
B = (by, bg, ~~, bn) (2) 
will be considered equal if their components, in the same places, 
coincide, that is, if a; = b;, i= 1, 2, ..., n. Lower-case Greek 


letters will be used to denote vectors and lower-case Latin letters to 
denote scalars. 

Examples of vectors are: (1) Vector segments (directed line- 
segments) emanating from the coordinate origin in a plane or in 
three-dimensional space will, given a fixed system of coordinates, 
be two- and three-dimensional vectors in the meaning of the definition 
given above. (2) The coefficients of a linear equation in n unknowns 
constitute an n-dimensional vector. (3) Any solution of a system 
of linear equations in n unknowns is an n-dimensional vector. 
(4) If an s by n matrix is given (s rows and n columns), then its 
rows are n-dimensional vectors, its columns, s-dimensional vectors. 
(5) The s by n matrix itself can be regarded as an sn-dimensional 
vector: all we need to do is read the elements of the matrix one 
after the other, row by row; in particular, any square matrix of 
order m may be regarded as an n?-dimensional vector, and it is 
quite obvious that any n?-dimensional vector may be obtained 
in this way from a matrix of order n. 

The sum of vectors (4) and (2) is the vector 


B= (a, + bir dg + ba -os On ton) 83) 


whose components are sums of the corresponding components of the 
vectors being added. Addition of vectors is commutative and associa- 
tive because of the commutativity and associativity of the addition 
of numbers. 

The role of zero is played by the zero vector: 


0 = (0, 0, ..., 0) E (4) 
Indeed, Boe 
. a+ 0 = (a, + 0, a +0, ..-, an + 0) 


= (i, Gg, ..-, Qn) = 
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‘We use the same. symbol- 0 for the zero vector as for the number 0. 
There is never any difficulty in deciding whether it is the number 
zero or the zero vector we are talking about at any time. However, 
from now on the reader should bear in mind the possibility of diffe- 
rent interpretations of the symbol 0. 

We use the term opposite vector (negative) of the vector (41) for 
the vector 


—a = (—a;, —g, >., —Gn) ~ (5) 


It is obvious that a + (—a) = 0. It is now easy to see that for 
the addition of vectors there is an inverse operation—subtraction: 
the difference between the vectors (1) and (2) is the vector a — B = 


=a + (—f), or 
œ — P = (a, — by, aa — bg, ., . Gn — bn) (6) 


The addition of n-dimensional vectors defined by formula (3) 
arose out of the geometric addition of vectors in the plane or in 
three-dimensional space performed by the parallelogram rule. In 
geometry we have to do with the multiplication of a vector by a real 
number (“scalar”): the multiplication of a vector œ by a scalar k 
signifies, for k > 0, a stretching of a by a factor k (it is compression 
if k < 1), and for k < 0 a stretching by a factor | k | and reversal 
of direction. Expressing this rule in terms of the components of the 
vector œ and passing to the general case at hand, we obtain the 
following definition. 

The product of a vector (1) by a scalar k is the vector 


ka = ak = (ka,, kag, ..., kan) (7) 


whose components are equal to the product of the corresponding 
components of the vector a by k. 

From this definition there follow important properties which 
may be verified by the reader: 


k (a + $) = ka + kp, _ (8) 
(k + l)a = ka tla, | (9) 
k (la) = (kl) a, - . (40) 
1a=a (41) 


The following properties are just as easy to verify but they may 
also be obtained as corollaries to Properties (8)-(44): 


0-« = 0, (12) 
(—1)-0 = —a, (13) 
k-0 = 0, (14) 


if ka = 0, then either k = 0, or a. = 0 | (15) 


62 _ GH, 2, SYSTEMS OF LINEAR EQUATIONS 


The collection of all n-dimensional ‘vectors-with real components 
regarded in:conjunction with the operations of addition of vectors 
and multiplication of a vector by a scalar is called an n-dimensional 
vector space. | . 

Note that the definition of an n-dimensional vector space does 
not include multiplication of a vector by a vector. It would be 
easy to define multiplication of vectors—assume, say, that the 
components of a product of vectors are equal to the products of the 
corresponding components of the factors. However, such multiplica- 
tion would not find any serious applications. Thus, vector segments 
emanating from a coordinate origin in the plane or in three-dimen- 
sional space constitute (for a fixed system of coordinates) a two- 
dimensional and, respectively, a three-dimensional vector space. 
The addition of vectors and the multiplication of a vector by a scalar 
are, as we have pointed out above, geometrically important, whereas 
it is impossible to give any reasonable geometrical interpretation 
to the componentwise multiplication of vectors. 

Let us consider another example. The left side of a linear equation 
in m unknowns, that is, an expression of the form 


f = aya, + ayzg +... F antr 


is called a linear form in the unknowns zi, 22, ..., £n. The linear 
form f is obviously defined completely by the vector (a,, a,, ..., an) 
of its coefficients; conversely, any n-dimensional vector uniquely 
determines some linear form. The addition of vectors and the multi- 
plication of a vector by a scalar become corresponding operations 
involving linear forms; these operations were extensively used 
in Sec. 1. Componentwise multiplication of vectors in this instance 
is meaningless. . 


9. Linear Dependence of Vectors 


A vector 8 of n-dimensional vector space is proportional to 
vector & if there exists a number k such that B = ka [see formula (7) 
of the preceding section]. In particular, the zero vector is propor- 
tional to any vector œ due to the equality 0 = 0-a. But if B = ka 
and B +0, whence k 0, then a = k71f, that is, for nonzero 
vectors, proportionality possesses the property of symmetry. 

A generalization of the concept of proportionality of vectors 
is the following concept which we have already (in the case of rows 
in a matrix) encountered in Sec. 4; a vector Ñ is called a linear 
combination of the vectors a1, Œa, ..., Œ if there exist numbers 
lL, lg, ..., l such that 


B = lay + lye +... + ba, 
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Thus the jth component of the vector p; j = 1, 2, ..., n, is equal 
(because of the definition of a sum of vectors and a product of a vector 
by a scalar) to the sum of the products of the jth components of 
the vectors a4, Gq, ..., Gs, by ly, lo, ... , ls, respectively. 

A system of vectors i 


Qis Qg a e, Oris Qr (r > 2) (4) 


is linearly dependent if at least one of the vectors is a linear combi- 
nation of the remaining vectors of the system; it is called linearly 
independent otherwise. . 

We give another form of this extremely important definition: 
a system of vectors (1) is linearly dependent if there exist numbers 
ky, ka ..., kp, at least one of which is nonzero, such that the equation 


kiai + kata +... + ka, = 0 (2) 
holds true. 
Proof of the equivalence of these two definitions is not difficult. 


For example, let the vector æ, of system (1) be a linear combination 
of the remaining vectors: 


Ap = u Oi + lO + eee + bp mghpat 
From this there follows the equation 
yay -+ lato + a es + | a. | — Ar = 0 


which is like (2), where k; = l; for i=1, 2,..., r—1 and 
k, = —1 that is k, +0. Conversely, let the vectors (1) be connected 
by the relation (2) in which, say, k, ~0. Then 
k k kp. 
tem (4) act (=E) ant (At) on 
Vector a, has proved to be a linear combination of the vectors 
a, ao, o 8 #9 Opa 4- 


Example. The system of vectors — 
a = (5, 2, 1), a = (—1, 3, 3), os = (9, 7, 5), ag = (3, 8, 7) 
is linearly dependent, since the vectors are connected by the relation 
4a, — az — 3a3 + 20,4 = 0 


In this relation all the coefficients are different from zero. However, there are 
other linear dependences between the vectors, dependences in which some of 
the coefficients are zero, for instance 


2a, + a2 — a3 = 0, 3a: + a3 — 2a; = 0 


The latter definition of a linear dependence given above is also 
applicable to the case of r = 1, that is, to the case of a system 
consisting of one vector a: this system is linearly dependent if and 
only if a = 0. Indeed, if a = 0, then, say, for k = í we will have 
ka = 0. Conversely, if ka = 0 and k #0, then a = 0: 
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‘Note the following property: of the concept.of linear dependence. 
_ -If some subsystem of. the system (1) of vectors is. linearly dependent, 
then the whole system. (1) is linearly dependent. ». . sae 

Indeed, let: the vectors a1, G9, ..%, Œa of system (1), where 
s<ir, be connected by the relation . ae 


kiai + kat o o + ke = 0 
in which not all coefficients:are:zero. It then follows that the relation 
kiai -+ kodla + eee + ks + O-s44 + dra + 0-a; = 0 


or system (1) is linearly dependent. -e 

From this property follows the linear dependence of any. system 
of vectors containing two equal or, generally, two proportional 
vectors and also of any system containing the zero vector. The 
property we have just proved can also be stated as follows: if a system 
` (1) of vectors is linearly independent, then any subsystem of (1) is also 
linearly independent. 

The question arises as to how many vectors a linearly indepen- 
dent system of n-dimensional vectors can contain and, in particular, 
whether there exist systems with an arbitrarily large number of 
vectors. To answer this question, let us consider the following 
vectors in an n-dimensional vector space: 


e, = (1, 0, 0, ..., 0), 


Gy (05 AG. Opi 0y 3) 


en = (0, 0, 0,..., 1) 
They are called unit vectors of that space. The system of unit vectors 
will be linearly independent: let 


kiei + kytg +... + EnEn = 0 


Since the left side of this equation is equal to the vector (ky, ka.. 
., Kk), it follows that , 


(ki, kos >. ; kn) = 0 


or k, = 0, i = 1, 2,..., n, since all the components of the zero 
vector are zero, and equality of vectors is equivalent to equality 
of their corresponding components. > 

Thus, in n-dimensional vector space we have found one linearly 
independent system consisting of n vectors. The reader will learn 
later on that there actually exist an infinite number of distinct 
systems of that kind in this space. 

On the other hand, let us prove the following theorem. 

For s > n, any s vectors of an n-dimensional vector space constitute 
a linearly dependent system. 
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Let. there. be given the -vectors. 


l Oy ood (Qu, G42; si “ tin), 

a he = (22 (Gaga, 2 6 oy Gon), 
= Qs = (ass, asa 2.3 aan). Bet 
We have to choose scalars k,, ka ..., ka not all zero, such that 
kida F khag+...+tkoa, = 0 (4) 


Passing from (4) to the corresponding equations between the compo- 
nents, we get 


aki + aaka +... + ayk, = 0, 


ai2K1 + lazka Pese + asks = 0, (5) 


Ginks + Genk, +... + Ank, = 0 
However, equations (5) constitute a system of n homogeneous 
linear equations in s unknowns ki, ka, ..., ka. The number of 
equations in this system is less than the number of unknowns, and 
therefore, as proved at the end of Sec. 4, the m has nontrivial 
solutions. We can thus choose scalars hy, ka, ..., k, (not all 
zero) which will satisfy requirement (4). The theorem is proved. 


Let us call a linearly independent system of n-dimensional 
vectors 


Qis Cee >- ny Op (6) 


a mazimal linearly independent system if by adjoining to this 
system any n-dimensional vector B we obtain a linearly dependent 
system. Since in every linear dependence relating the vectors 
Qi, Gg, . +, Œp, P, the coefficient of P must be nonzero—otherwise 
system (6) would be linearly dependent—it follows that the vector 
6 is expressed linearly in terms of the vectors (6). Therefore the 
system (6) of vectors is a maximal linearly independent system if 
and only if the vectors (6) are linearly independent and any n-dimen- 
sional vector ĝ is a linear combination of them. 

From the results obtained above it follows that in an n-dimensional 
space any linearly independent system consisting of n vectors is maximal 
and also that any maximal linearly independent system of vectors of 
this space consists of at most n vectors. 

Every linearly independent system of n-dimensional vectors is 
contained in at least one maximal linearly independent system. Indeed, 
if a given system of vectors is not maximal, then one vector may 
be added to it so that the resulting system remains linearly inde- 
pendent. If this new system is still not maximal, then another vector 


5—5760 
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may be added to it, and so on. However, this process cannot continue 
endlessly because every system of n-dimensional vectors consisting 
of n +1 vectors is linearly dependent. 

Since every system consisting of one nonzero vector is linearly 
independent, we find that any nonzero vector is contained in some 
maximal linearly independent system, and for this reason there are 
infinitely many different maximal linearly independent systems of 
vectors in an n-dimensional vector space. 

The question arises: do there exist, in this space, maximal 
linearly independent systems with a smaller number of vectors 
than n or is the number of vectors in any such system invariably 
equal to n? The answer to this important question will be given 
below after a few preliminary investigations. 

If vector 6 is a linear combination of the vectors 


2 SR sa em 9 2 (7) 


it is often said that B is expressed linearly in terms of system (7). 
Naturally, if vector $ is linearly expressed in terms of some subsystem 
of this system, then it will be linearly expressed in terms of (7) 
as well—it would be sufficient to take the remaining vectors of the 
system with coefficients equal to zero. Generalizing this terminology, 
we say that the system of vectors 


Bis Bar ++ Bs = (8) 


is expressed linearly in terms of system (7) if every vector B;, i = 1, 2, 
. S, is a linear combination of the vectors of (7). 
We prove the transitivity of this concept: if system (8) is expressed 
linearly in terms of (7), and the system of vectors 


Yir Vor +s Vt | (9) 


is expressed linearly in terms of (8), then (9) is expressed linearly in 
terms of (7) as well. l 
Indeed, 


U= A ljip, ` j=41, yio t (10) 


but ĝ; = 2 kimm, t=1, 2, ..., S. Substituting these expressions 
into ao we get 


y= D la ( È kimam) = X (È ljikim) om 
i={ m=1 m=1 i=i 


In other words, every vector yj, j = 1, 2, ..., £, is a linear com- 
bination of vectors of system (7). 
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Two systems of vectors are termed equivalent if each one of them 
can be expressed linearly in terms of the other. From the proof 
given here of the transitivity of the property of systems of vectors 
to be expressed linearly in terms of each other there follows the 
transitivity of the concept of equivalence of systems of vectors and 
also the following assertion: if two systems of. vectors are equivalent 
and if some vector is expressed linearly in terms of one of these systems, 
then it will be expressed linearly in terms of the other too. 

One cannot assert that if one of two equivalent systems of vectors 
is linearly independent, then the other system also possesses this 
property. But if both systems are linearly independent, then an 
important statement can be made with respect to the number of 
vectors entering into them. First let us prove the following theorem 
which, because of the role it will play in the future, it will be con- 
venient to term a fundamental theorem. 

If in an n-dimensional vector space we have two systems of vectors: 


(I) Qis Ag, oe es Ars 
(It) Bi, Be, ears Bs 


the first being linearly independent and expressible linearly in terms 
of the second, then the number of vectors in the first system does not 
exceed that in the second system, or r Ss. 

Let rœs. By hypothesis, each vector ‘of system (I) can be expressed 
linearly in terms of system (II): 


ag = As By + liaa +... + tispa 
Oe = Agi By + aapa +... + azo | (41) 
Op = Gri By E Grebo +. ~~ + GrsBs 
The coefficients of these linear expressions constitute a system of 
r s-dimensional vectors: 


Yi = CAT Bigs: sna Msh 
Vo = (Goi, Goo, >- +) Ags); 


; Yr = (aris Aros «ony ars) 
Since r œ> s, these vectors are linearly dependent, that is, 


kiyi + kaya H... + kryr = 0 


where not all coefficients ki, k,, ..., k, are zero. Whence we arrive 
at certain equations between the components: 


hay =0, j=i,2,...,8 (12) 


ol 
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Let us now consider the following linear combination of vectors 
of system (4): . sgi 


kiti at Kies + ` ee kras l 


or, more compactly, 5 kiti. Utilizing (41) and (12), we get 


r 


$ ra= X h (X u) = > (Š kan) B= 0 


But this runs counter to the linear independence of system (1). 

From the fundamental theorem just proved we have the following 
result. 

Any two equivalent linearly independent systems of vectors contain 
an equal number of vectors. 

Any two maximal linearly independent systems of n-dimensional 
vectors are evidently equivalent. They therefore consist of one and 
the same number of vectors, and since (as we know) there exist 
systems of that kind consisting of n vectors, we finally get the answer 
to the earlier posed question: every mazimal linearly independent 
system of vectors of an n-dimensional vector space consists of n vectors. 

Some corollaries follow. 

If in a given linearly dependent system of vectors we take two 
maximal linearly independent subsystems, that is, subsystems to 
which no vector of our system can be adjoined without spoiling the linear 
independence, then these subsystems contain an equal number of vectors. 

Indeed, if in the system of vectors 


Ogg Chay xo of. Oe (13) 
the subsystem 
Qs, Chay Cr } Xs, s<ir (14) 


is a maximal linearly independent subsystem, then any one of the 
vectors G41, ---, Œp is expressible linearly in terms of system (14). 
On the other hand, any vector a; of system (13) is linearly expressible 
in terms of this system: it is only necessary to take the coefficient 
4 for the vector a,, and the coefficient 0 for all the other vectors. 
It is now easy to see that systems (43). and (14) are equivalent. From 
this it follows that (13) is equivalent to any one of its maximal 
linearly independent subsystems, and therefore all the subsystems are 
equivalent; i.e., being linearly independent, they contain the same 
number of vectors each. 

The number of vectors in any maximal linearly independent 
subsystem of a given system of vectors is termed the rank of the 
system. Taking advantage of this concept, we derive yet another 
corollary from the fundamental theorem. 
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‘Suppose there are two systems of n-dimensional vectors: 
Qis Oor - + +y Op (15) 
and 


By, Bo, os : Bs (16) 


which are not necessarily linearly independent; the rank of system (15) 
is equal to the number k, the rank of system (16), to the number l. If 
the first system is expressed linearly in terms of the second, then k < l. 
But if these systems are equivalent, then k = l. 

In fact, let 
(17) 


His, Cis oe wy Qik 


Ba Bizo -- o Pa (18) 


be, respectively, any maximal linearly independent subsystems 
of (15) and (16). Then systems (15) and (17) are equivalent and 
the same holds true for (16) and (18). From the fact that (15) is 
linearly expressible in terms of (16) It now follows that (17) is 
also linearly expressible in terms of (16) and therefore in terms of the 
equivalent system (18). It then remains, utilizing the linear indepen- 
dence of system (17), to apply the fundamental theorem. The second 
Soa ae of the corollary being proved follows directly from the 
rst 


and 
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If we are given a system of n-dimensional vectors, it is natural 
to ask whether this system of vectors is linearly dependent or not. 
One cannot hope to find that in every specific instance the question 
will be resolved without difficulty: a superficial examination of the 
system of vectors 


a = (2, 1, —1), B= (1, 3, 6, 5), po (1, 4 4,2) 


fails to reveal any linear dependences in it, though in reality these 
vectors are connected by the relation 


Ta — 38 + tty = 0 


One way of settling this issue is given in Sec. 4. Since the com- 
ponents of the given vectors are known, we consider as unknown 
the coefficients of the desired linear dependence and obtain a system 
of homogeneous linear equations, which we solve by the Gaussian 
method. In this section we suggest a different approach, which will 
also bring us closer to our principal objective—the solution of arbitra- 
ry systems of linear equations. 
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Suppose we have an.s by n matrix. (s rows and n columns) 


444 Aig = + Gin 

Qo lag ene don 
A= 

A s4 Aso s. œ Asn 


the numbers s and n not being related in any way. Regarded as 
s-dimensional vectors, the columns of this matrix may, generally 
speaking, be linearly dependent. The rank of the system of columns, 
that is the maximal number of linearly independent columns of 
matrix A (more precisely, the number of columns in any maximal 
linearly independent subsystem of the system of columns) is called 
the rank of the matrix. 

Naturally, in the same way the rows of matrix A may be regarded 
as, n-dimensional vectors. It appears that the rank of the system 
of rows of the matrix is equal to the rank of the system of its columns, 
that is, it is equal to the rank of the matrix. The proof of this extre- 
mely unexpected assertion will be obtained after we point out 
yet another way of defining the rank of a matrix (which at the same 
time indicates a practical method of evaluation). 

Let us first generalize the concept of a minor to the case of rectan- 
gular matrices. In matrix A we choose arbitrary k rows and k columns, 
k < min(s, n). The elements at the intersection of these rows and 
columns constitute a square matrix of order k, the determinant of 
which is called the kth-order minor of matrix A. We will now be 
interested in the orders of those minors of A which differ from zero, 
namely, the highest one of these orders. In searching for it, it is well 
to bear in mind the following: if all kth-order minors of matrix A 
are zero, then so also are all minors of higher order. Indeed, expanding 
any minor of order k + j, k< k -+j < min (s, n), by the Laplace 
theorem in terms of any k rows, we represent this minor as a sum 
of minors of order k multiplied by certain minors of order j, thus 
proving that it is zero. 

Let us now prove the following theorem on the rank of a matrix. 

The highest order of nonzero minors of matrix A is equal to the 
rank of the matrix. 

Proof. Let the highest order of nonzero minors of matrix A 
be r. Let us assume—there is no loss of generality— that the rth-order 
minor D in the upper left corner of the matrix 
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is different from zero, D: 0. Then the first r columns of A will 
be linearly independent: if the dependence were linear, then, since 
corresponding components are combined in the addition of vectors, 
this same linear dependence would exist among the columns of 
minor D and therefore D would be zero. 

Now let us prove that each ith column of A,r < 1 < n, isa linear 
combination of the first r columns. We take any i, 1: <i<s, and 
construct an auxiliary determinant of order (r + 4): 


Qi + + + Gir Ay] 
A; = . . . 

Ari + + + App Ari 

Gig + + Qir Qil 


obtained by “bordering” the minor D by appropriate elements of the 
lth column and the ith row. Determinant A; is zero for any i. Indeed, 
if i œr, then A, is a minor of order (r + 1) of our matrix A and 
therefore is zero due to the choice of the number r. But if <r, 
then A; can no longer be a minor of matrix A since it ca: .ot be 
obtained by deleting from this matrix certain of its rows and columns; 
however, determinant A; now has two equal rows and, herce, is 
again zero. 

Let us examine the cofactors of the elements of the last row 
of determinant A;. Obviously, the cofactor of the element a;, is 
minor D. But if 1<j<r, then for the cofactor of element a;; 
in A, we have the number 


Aya (ATE ee ee 
ari owe ar, j- dr, jti see arr arl 
It is not dependent on i and therefore is denoted by A;. Thus, expand- 
ing determinant A; about its last row and equating this expansion 
to zero, since A; = 0, we get 
a;1A, + AirAs +... + a;,A; + a;,D = 0 
whence, because D 0, . 


Ay As Ar 
Ait = — 41 — FH te e y r 


This equation holds true for all i, i = 1, 2,..., s, and since 
its coefficients are not dependent on i, we find that the entire lth 
column of A is a sum of the first r columns taken, respectively, with 


the coefficients — A Ag A, 


D’ p?°*** D' 


72 CH. 2. SYSTEMS OF LINEAR. EQUATIONS 


In the system of columns of matrix A we have thus found a maxi- 
mal linearly independent subsystem consisting of r columns. This 
is proof that the rank of matrix A is equal to r, and it completes the 
proof of the rank theorem. © = > pay oat 4 i 

This theorem provides a practical method for. computing. the 
rank of a matrix and therefore for settling the question of the exi- 
stence of linear dependence in a given system of vectors; forming 
a matrix for which the given vectors serve as columns and computing 
the rank of the matrix, we find the maximum number of linearly 
independent vectors of our system. 

The method of finding the rank of a matrix based on the rank 
theorem requires computing a finite but perhaps very large number 
of minors of the matrix. The following remark suggests a way of 
substantially simplifying this procedure. If the reader will again 
look through the proof of the rank theorem, he will notice that in 
the proof we did not take advantage of the fact that all minors of 
order (r + 1) of matrix A are equal to zero; actually, we used only 
` those minors of order (r + 1) which border the given nonzero rth- 
order minor D (that is, those which contain it completely within 
themselves); for this reason, from the fact that only these minors 
are equal to zero it follows thatr isthe maximum number of linearly 
independent columns of matrix A; this implies that all minors of 
order (r + 4) of this matrix are zero. We arrive at the following 
rule for evaluating the rank of a matrix. _ E 

In computing the rank of a matrix, move from minors of smaller 
order to minors of greater order. If a nonzero kth-order minor D has 
already been found, then only the (k + 1)th-order minors bordering 
minor D need be computed; if they are all zero, the rank of the matrix 
is k. 

Example 1. Find the rank of the matrix 

2—4 3 10 
fot 1—42 
0 1—1 314 
4-7 4—45 

The second-order minor in the upper left corner of this matrix is zero. Howe- 
ver, the matrix also contains nonzero minors of order two, for instance, 

—4 3 


d = 0 

eo ka 

The third-order minor a ; 
| a joe: 3 
d’=|1-2 4 


0 41-1 
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bordering minor d is: different:from zero, d’ = 1, but both fourth-order minors 
bordering minor d’ are zero: 


2-4 3 1 2—4 30] 
1—2 4-4 1—2 12 
ae ee fo 4-414 
4—7 4—4] 47 45 


Thus, the rank of matrix A is three. 
i Example 2, Find the maximal linearly independent subsystem in the system 
of vectors 


ay = (2, —2, —4), a = (4, 9, 3), a3 = (—2, = 4), a= (3, 7, —1) 


Form the matrix 
21-2 3 
(-2 9 —4 | 
—43 4 —i 


in which the given vectors are columns. The rank of this matrix is two: the 
. second-order minor in the upper left corner is nonzero, but both third-order 
minors bordering it are zero. From this it follows that the vectors Œi, Œ form 
in the given system one of maximal linearly independent subsystems. 


As a corollary to the rank theorem, we now prove an assertion 
that was stated earlier. 

The maximum number of linearly independent rows of any matrix 
is equal to the maximum number of its linearly independent Cons, 
which means that it is equal to the rank of the matrix. _ 

To prove this, take the transpose of the matrix (that is, inter- 
change rows and columns retaining the subscripts of the elements). 
In taking the transpose, the maximal order of nonzero minors of 
the matrix cannot change since taking transposes does not change 
the determinant, and for any minor of the original matrix the minor 
obtained from it by taking the transpose is in the new matrix, and 
conversely. Whence it follows that the rank of the new matrix is 
equal to the rank of the original matrix; it is also equal to the maxi- 
mum number of linearly independent columns of the new matrix 
(or the maximum number of linearly independent rows of the ori- 
ginal matrix). 


Example. In Sec. 8 we introduced the concept of a linear form in n un- 
knowns and defined addition of linear forms and their multiplication by a sca- 
lar. This definition permits extending to linear forms the concept of. linear 
dependence with all its properties. . 

Let there be a system of linear forms 


fi = æ, + 2r t 23 + 3K, 
n= 42, — zz — 5x3 — Ba, 
fs = zi oa 32 — 4z3 SA Tzi, 


fa = 2z + T2 — 23 


In it we have to choose a maximal linearly independent subsystem. 
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Form: the matrix of the coefficients -of. these -forms:. 


42 14 3 
4 —1 —5 —6 
43 —4 —7 
2 4-4 '0 


and find its.rank. The second-order minor in the upper left corner is nonzero, 
but, as can easily be verified, all four third-order minors bordering it are zero. 
Whence it follows that the first two rows of our matrix are linearly independent, 
and the third and fourth are linear combinations of them. Hence, the system 
fi, fe is the desired subsystem of the given system of linear forms. 


There is yet another important consequence of the rank theorem. 

An nth-order determinant is equal to zero if and only if there is 
a linear dependence among its rows. 

- This assertion has already been proved in one direction in Sec. 4 
(Property 8). Now let there be given an mth-order determinant equal 
to zero; in other words, suppose we have a square matrix of order 
n whose only minor having maximal order is zero. It then follows that 
the highest order of the nonzero minors of this matrix is less than n, 
that is, the rank is less than z, and so, on the basis of the foregoing 
proof, the rows of this matrix are linearly dependent. 

Quite naturally, this corollary can be stated with columns taken 
instead of rows. S 

There is yet another way to compute the rank of a matrix which 
is not connected with the rank theorem and does not require evaluat- 
ing determinants. Incidentally, it is only applicable when we wish 
to know only the rank itself and are not interested in precisely which 
columns (or rows) comprise the maximal linearly independent system. 
The procedure is this. 

We use the term elementary transformations of a matrix A for the 
following transformations: 

(a) interchange (transposition) of two rows or two columns; 

(b) multiplication of a row (or a column) by an arbitrary non- 
zero scalar; 

(c) addition of a multiple of one row (or column) to another row 
(column). ; 

Clearly, elementary transformations do not change the rank of a 
matriz. Indeed, if these transformations are applied, say, to the 
columns of a matrix, the system of columns (regarded as vectors) is 
replaced by an equivalent system. We prove it for transformation (c) 
since for (a) and (b) it is obvious. Let the jth column multiplied by 
a number k be added to the ith column. If, prior to the manipulation, 
the vectors 


Qis . . o9 Chis ee iy Ajs ee 09: An (4) 
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served as columns of the matrix, ‘then ater the manipulation the 
vectors 
Og = Oy + hay, 6 6 cy Ajs 2 0 An p (2) 


will form. the columns of the matrix. ae (2) is at pressible 
linearly in terms of system (1), and the equation 


A, = a; — kaj 


shows that (4), in turn, is linearly expressible in terms of (2). Conse- 
quently, these systems are equivalent and for this reason their maxi- 
mal linearly independent subsystems consist of the same number of 
vectors. 

Thus, when computing the rank of a matrix, the matrix may 
first be simplified by means of a combination of elementary trans- 
formations. 

We say that an s by n matrix has diagonal form if all it~ vlements 
are zero except the elements aii, @,.,..., a,r [where O0<r< 
< min (s, n)], which are equal to unity. The rank of this matrix 
is obviously r. 

Using elementary transformations, it is possible to reduce any 
matrix to diagonal form. 

Indeed, suppose we have a matrix 


Ait - + + Ain 
Ar eK a? ee om ee 
Asi o.. Agn 


If all the elements are zero, then it already has diagonal form. But 
if there are nonzero elements, then an interchange of rows and 
columns will change element a,, to a nonzero element. Then by 
multiplying the first row by a,;, we convert element a,, to unity. 
And if we now subtract from the jth column, j > 1, the first column 
multiplied by a,;, then element a; will be replaced by a zero. 
Manipulating in similar fashion all columns beyond the first, and 
also all rows, we arrive at a matrix of the form 


10 ... 0 


r , 

A = O Gey -ayn 
, r 

O a- azn 


Performing the same manipulations with the submatrix that remains 
in the lower right corner, and so on, we finally—after a finite number 
of manipulations—arrive at a diagonal matrix with the same Bia 
as the original matrix A. : ; 
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“Thus, to find the rank of a matrix it is necessary to’ convert the 
matriz, by means of elementary transformations, to diagonal form and 
count the number of units in the principal diagonal. — 

Example. Find the rank of the matrix 


2 Qe 2 ~+4ye 
—1—4 5 
A=] 3.4. 7.: 
0 5 —10 
23 0 


Interchanging the first and second columns and multiplying the first row 


by the number z we get the matrix 


2 , 
1 0 —2 
—4-1. 5 
1 3 7 
5 .0 —10 
3 2 0 


Adding two times the first column to the third column and then adding some 
multiple of the new first row to each of the remaining rows, we get the matrix 


4 0 0 
0 —1 —3 
0 3 9 
0 0 0 
0 2 6 


Finally, multiplying the second row by —1, subtracting from the third column 
three times the second column, and then subtracting from the third and fifth 
rows certain multiples of the new second row, we arrive at the desired diagonal 
orm 


The rank of the matrix A is thus two. 

In Chapter 13 we will again encounter elementary transformations and 
diagonal matrices; true, these will be matrices in which the elements are poly- 
nomials, not numbers. - 


{1. Systems of Linear Equations 


We now begin the study of arbitrary systems of linear equations 
without any assumptions concerning the number of equations of 
a system being equal to the number of unknowns. Incidentally, the 
results we achieve will be applicable to the case (not considered 
in Sec. 7) when the number of equations is equal to the number of 
unknowns, but the determinant of the system is zero. 
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-Supposé we have a system of linear equations 


1404 + A1oLe + . . à + QinIn rensas b,, 
© Gyia, + Agta bs F Aonta = ba A (1) 


Dei Ty + aszt ete ae Anin =. bs 


As we know from Sec. 4, the first thing is to decide whether the 
system is consistent or not. For this purpose, take the coefficient 


matrix A of the system and the augmented matrix A obtained by 
adjoining to A a column made up of the constant terms, 


yt Aig «+s Ain aii lig... 47,04 

rae Qz dza -e e s An 7 = Qot oo eo 8 Gon de 
A= oH 

Asi Asg » - +» Agn As, Asz - - . Agnds 


and evaluate the ranks of these matrices. It is easy to see that the 


rank of matrix A is either equal to the rank of matrix A or exceeds the 
latter by unity. Indeed, take a certain maximal linearly independent 
system of columns of matrix A. It will also be linearly independent 
in matrix A. If it also retains the property of maximality, that is, 
the column of the constant terms is expressible linearly in terms of it, 
then the ranks of matrices A and A are equal; otherwise, adjoining 
to this system a column made up of constant terms yields a linearly 
independent system of columns of matrix A, which is maximal in it. 

The question of consistency of a system of linear equations is fully 
resolved by the following theorem. _ 

Kronecker-Capelli theorem. A system of linear equations (1) is 
consistent if and only if the rank of the augmented matrix A is equal 
to the rank of the matriz A. 

Proof. 1. Let system (1) be consistent and let ky, ka, ..., kn 
be one of its solutions. Substituting these numbers, in place of the 
unknowns, into (1), we get s identities, which show that the last 
column of A is the sum of all the remaining columns taken, respecti- 


vely, with the coefficients ki, ko, ..., kn. Any other column of A is 
also in A and therefore is expressible linearly in terms of all the 
columns of this matrix. Conversely, any column of matrix A is 
a column of A as well, that is, it is linearly expressible in terms of 
the columns of this matrix. From this it follows that the systems 


of columns of matrices A and A are equivalent and therefore, as 
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proved at the end of Sec. 9, both these systems of s-dimensional 
vectors have one and the same rank; in other words, the ranks of 
the matrices A and A are equal. 

2. Now suppose that the matrices A and A have equal ranks. 
It then follows that any maximal linearly independent system of 
columns of A remains a maximal linearly independent system in 
matrix A as well. For this reason, the last column of A can be expres- 
sed linearly in terms of this system and therefore, generally, in terms 
of the system of columns of matrix A. Consequently, there exists 
a system of coefficients ki, ka, ..., kn such that the sum of the 
columns of A taken with these coefficients is equal to the column of 
constant terms, and therefore the numbers ki, kz, ..., kn constitute 
a solution of system (1). Thus, coincidence of the ranks of matrices A 
and A implies that system (1) is consistent. 

The proof is complete. In practical situations, it is first necessary 
to compute the rank of matrix A; to do this, find one of the nonzero 
minors of the matrix such that all the minors bordering it are zero. 
Let it be the minor M. Then compute all the minors of matrix A 
bordering M but not contained in A [the so-called characteristic 
determinants of system (1)]. If they are all zero, then the rank of 
matrix A is equal to the rank of matrix A and therefore system (4) 
is consistent, otherwise it is not consistent. Thus, the Kronecker- 
Capelli theorem may be stated as follows: a system of linear equations 
(1) is consistent if and only if all its characteristic determinants are 
equal to zero. | 

Let us now suppose that system (1) is consistent. The Kronecker- 
Capelli theorem which we used to establish the consistency of this 
system states that a solution exists. However, it does not give us 
any practical method for finding all the solutions of the system. We 
shall now investigate this problem. 

Let matrix A have rank r. As was proved in the preceding section, 
r is equal to the maximum number of linearly independent rows 
of matrix A. To be specific let the first r rows of A be linearly indepen- 
dent, and let each of the remaining rows be a linear combination 
of them. Then the first r rows of A will also be linearly independent: 
any linear dependence between them would also bea linear depen- 
dence among the first r rows of A (recall the definition of addition 
of vectors!). From coincidence of the ranks of matrices A and A it 


follows that the first r rows of A constitute, in it, a maximal linearly 
independent system of rows; in other words, any other row of this 
matrix is a linear combination of them. 

It follows, then, that any equation of system (1) can be represent- 
ed as a sum of the first r equations taken with certain coefficients 
and therefore any general solution of the first r equations will satisfy 
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all the equations of (1). Consequently, it suffices to find all the solu- 
tions of the system ` : 


A104 + lita +... + Ant, = Dy, 
Aat + Aata ea u Amin = a (2) 
Opry + Arete Po a Amin = i 


Since the rows of coefficients of the unknowns in equations (2) 
are linearly independent, that is the matrix of the coefficients has 
rank r, it follows that r < n and, besides, that at least one of the 
minors of order r of this matrix is nonzero. If r = n, then (2) is 
a system with an equal number of equations and unknowns and 
with a nonzero determinant; that is, it, and for this reason system (1) 
as well, has a unique solution, namely, that which is calculable 
by the ‘Cramer rule. 

Now let r< n and, for definiteness, let the A eoader minor 
made up of the coefficients of the first r unknowns be different from 
zero. In each of the equations of (2), transpose to the right side all 
terms with the unknowns 7,4;, ..., Zn and for these unknowns 
select certain values c,44,..., Cne We obtain a system of r equations: 


ati + Appt, +... F Apt, = bi — Qi, r44fr41 — -o o — Ainln, 
Qati F azta F... F aort, = bg — Ag, p+4Cr+1 — - +» — lonl, 


arit ae Arata oe n F arrir = b, — Bp, r+14r+1 — + e e — Armên 
(3) 


in the r unknowns zi, 22, ..., Zp Cramer’s rule is applicable and 
therefore the system has a unique solution ci, Co, ..., Cp; it is 
obvious that the set of numbers c4, Ca, .. ., Cry Cris «++ s Cn Will 
serve as a solution of system (2). Since the values c,44, ..., Cn 
for the unknowns £,+i, ..., Zn, called free unknowns, can be cho- 
sen in arbitrary fashion, we obtain an infinity of distinct solutions 
of system (2). 

On the other hand, any solution of (2) may be obtained in the 


indicated way: if some solution c1, ce, ..., Cn of (2) is given, then 
we take the numbersce,44, .. ., Cn for the values of the free unknowns. 
Then the numbers c,, €a, . . ., c, will satisfy system (3) and there- 


fore will constitute the only solution of the system, which solution 
is computed by Cramer’s rule. 

The foregoing may be combined into a rule for the solution of 
an arbitrary system of linear equations. 

Let there be a consistent system of linear equations (1) and let the 
matriz A of the coefficients have rank r. In A we choose r linearly inde- 
pendent rows and leave in (1) only those equations whose coefficients 
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lie in the chosen rows. In these equations we leave. in- the left members 
r unknowns such that the determinant of their coefficients is nonzero, 
the remaining unknowns are called free and are transposed to the right 
sides of the equations. Assigning arbitrary numerical values to the free 
unknowns and computing the values of the remaining unknowns by 
Cramer’s rule, we obtain all the solutions of system (A). 

We also state the following result that we have obtained. 

A consistent system (1) has a unique solution if and only if the rank 
of matrix A is equal to the number of unknowns. 


Example 1. Solve the system 


5z — t2 + 2r t n= 7, 
2a, + atima) 
zı — 3x2 — 623 + 5r, = 0 


- The rank of the coefficient matrix is two: the second-order minor in the upper 
left corner of this matrix is nonzero, but both third-order minors bordering it 
are zero. The rank of the augmented matrix is three, since 


5 —17 
2 t1/= —35 #0 
4 —3 0 


The system is thus inconsistent. 
Example 2. Solve the system 


The rank of the coefficient matrix is two, i.e., it is equal to the number 
of unknowns; the rank of the augmented matrix is also two. Thus, the system 
is consistent and has a unique solution. The left-hand sides of the first two 
equations are linearly independent; solving the system of these two equations, 
we get the values 

Geer, Bee? 
a a aa 
for the unknowns. It is easy to see that this solution also satisfies the third 
equation. | ae a 
Example 3. Solve the system 


zit z — 2r; — a+ z= 1, 
32, — t: + z3 + 4z; + 32, = 4, 
zı + 5zr2 — 92g — 82, + r= 0 


The system is consistent since the rank of the augmented matrix (like 
the rank of the matrix of coefficients) is two. The left members of the first and 
third equations are linearly independent since the coefficients of the unknowns 
zı and zz constitute a nonzero minor of order two. Solve the system of these 
two equations, the unknowns z3, z4, zs being considered. free; transpose them 
to the right members of the equations and assume that they have been given 
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. certain numerical values. Using Cramer’s rule, we get. - 


ee See a cere 
aR ET T3 — gti Isr 


These equations determine the general solution of the given sýstem: assign- 
ing arbitrary numerical values to the free unknowns, we obtain all the solu- 
tions of our system. Thus, for example, the vectors (2, 5, 3, 0, 0), (3, 5, 


2, 4, —2),. (0, 
other hand, substituting the expressions for z, and z, from the general solution 
into any one of the equations of the system, say the second, which was earlier 
rejected, we obtain an identity. |. ; 
Example 4. Solve the system 
4a, tg — 2r t A= 3, ) 
a — 2r, — z3 + 2r; = 1 | 


ara =1 4, =) and so on are solutions of our system. On the 


22, + 52x — n= —1, 
32, + 322 — zg — 3r = 1 


Although the number of equations is equal to the number of vu. «nowns, 
the determinant of the system is zero and, therefore. Cramer’s rule is nut appli- 
cable. The rank of the coefficient matrix is equal to three—in the upper right 
corner of this matrix is a nonzero third-order minor. The rank of the augmented 
matrix is also three, so the system is consistent. Considering only the first 
three equations and taking the unknown z, as free, we obtain the general solu- 
tion in the form = : 


O EN: P: 
amog Tp ar ee am 
Example 5. Suppose we have a system consisting of n + 1 equations in n 
unknowns. The augmented matrix A of this system is a square matrix of order 
n -+ 1. If our system is consistent, then, by the Kronecker-Capelli theorem, 


the determinant of A must be zero. 
Thus, let there be a system 


zı — 8z = 3, 
224 + Tz = 1, 
hey + Tzs = —4 


The determinant of the coefficients and the constant terms of these equations 
is different from zero: 


1—8 3 
2 4 4|/=—77 
4 1—4 


The system is therefore inconsistent. r 
The converse, generally speaking, is not true: from the determinant of 


matrix A being zero it does not follow that the ranks of matrices A and A 
coincide. 


6—5760 


82 CH. 2. SYSTEMS OF LINEAR EQUATIONS 


12. Systems of Homogeneous Linear Equations 


Let us apply the findings of the preceding section to the case of 
a system of homogeneous linear equations: 


izi + diota + cee F Any = 0, 
Aoi Ly, + AagLg + eee + Aon Tn = 0, (1) 


asiti + a s272 + + 2 + asninn = 0 


= From the Kronecker-Capelli theorem it follows that this system 
is always consistent, since adding a column of zeros cannot raise 
the rank of the matrix. This incidentally is evident by a simple 
inspection—system (1) definitely has a trivial solution (0,0, .. ., 0). 

Let the coefficient matrix A of system (1) have rank r. Jf r =n, 
then the trivial solution will be the only solution of (1); for r< n, 
the system has also nontrivial solutions; to find all these solutions, use 
the same technique as above in the case of an arbitrary system of 
equations. In particular, a system of n homogeneous linear equations 
in n unknowns has nontrivial solutions if and only if the determinant 
of the system is zero.* Indeed, the fact that the determinant is zero 
is equivalent to the assertion that the rank of matrix A is less than n. 
On the other hand, if in a system of homogeneous equations the number 
of equations is less than the number of unknowns, then the system must 
definitely have solutions different from zero, since in that case the rank 
cannot be equal to the number of unknowns. This was already 
obtained in Sec. 1 by other reasoning. 

Let us, for example, examine the case of a system consisting of 
n — 1 homogeneous equations in n unknowns; assume that the left 
members of these equations are linearly independent among them- 


selves. Let . 


On—-1, 1 n-i -> An-n 


be the matrix of the coefficients of this system. Denote by M; the 
minor of order n — 1 obtained by deleting the ith column from A, 


i=1,2,..., n. Then for one of the solutions of our system we have 
the set of numbers = 
Mı, —M,, Mg, —M,, ..., (—1)"7M, (2) 


and any other solution is proportional to it. 


. One half of this assertion was already proved in Sec. 7. 
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Proof. Since, by hypothesis, the rank of matrix A is n — í, 
one of the minors M; must be nonzero; let it be M,. Assume the 
unknown z, to be free ‘and transpose it to the right side of each of the 
equations. We then get 


ayti + Ayet, Fea Ft, nein- = — Ain Ips 
Agi Ly + Ao9Le + eee + y Qs, n-itn—1 = — lan tn, 
An—1, 1% + An-1, 2%2 fos ee ct An-1, n-1%n—-1. ~ —AQn-1, nn 


Applying Cramer’ s rule, we obtain the general solution of the given 
system of equations, which, after simple manipulations, becomes 


Tm Mi ° 
x;=(—1) t tn i=1, 2, ...,n—1 (3) 
Setting zn = (— —1)">M,, we obtain: = (—1)""1,, i =å; 
2,..., m—1, or, since the difference On — i—i) — G — 1) = 


= 2n — 2i is an even number, z; = (—1)'?M,, that is, the set 
of numbers (2) will indeed be a solution of our system of equations. 
Any other solution of this system is obtained from formulas (3) 
for a different numerical value of the unknown z, and so is propor- 
tional to solution (2). This assertion clearly holds true for the case 
when M,, = 0, but one of the minors M;, 14 <i < n — 1, is nonzero. 

Solutions of a system of homogeneous linear equations have 
the following properties. If the vector B = (bi, ba, ..., bn) is 
a solution of system (1), then for any scalar k the vector kB = 
= (kb,, kba, ..., kbn) is also a solution of the system. This is veri- 
fied directly by substitution into any one of the equations (1). If the 
vector y = (Ci, Cg, - . +» Cn) is another solution of (1) then the vector 
B+ y= (bi + ci, ba + C2, ---, On + Cn) is also a solution of the 
system: 


n n n 
Dai lbs te)= Daudi + BWaiyey=O0 i=t, 2, ..., 8 
J= x D bod j= . 


Thus, generally, any linear combination of solutions of the homogeneous 
system (1) is a solution of the system. Note that in the case of a non- 
hémogeneous system, that is, a system of linear equations whose 
constant terms are not all equal to zero, no such assertion is true: 
neither the sum of two solutions of a system of nonhomogeneous 
equations nor the product of a solution of the system by a scalar can 
serve as solutions of the system. 

From Sec. 9 we know that any Sesion of n- dimensional vectors 
consisting of more than z vectors will be linearly dependent. Whence. 
it follows that from a number of solutions of the homogeneous system 
(1), which solutions, as we know, are n-dimensional vectors, it is 


6* 
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possible to choose a finite maximal linearly independent. system, 
that is, maximal in the sense that any other solution of system (1) 
will be a linear combination of the solutions that enter into the chosen 
system. Any maximal linearly independent system of solutions of 
the homogeneous system of equations (1) is called its fundamental. 
system of solutions. 

Let us once again stress the fact that an n-dimensional vector is 
a solution of system (A) if and only if it is a linear combination of vectors 
comprising the given fundamental system. 

Quite naturally, the fundamental system exists only if system (1) 
has nontrivial solutions, that is, if the rank of its matrix of coeff- 
cients is less than the number of unknowns. Then system (1) can 
have many different fundamental systems of solutions. All these 
systems are equivalent however, since each vector of any one of the 
systems is linearly expressible in terms of any other system, and 
for this reason the systems consist of one and the same number. of 
solutions. l 

The following theorem is valid. 

If the rank r of the coefficient matrix of the system of homogeneous 
linear equations (1) is less than the number of unknowns n, then any 
fundamental system of solutions of (1) consists of n — r solutions. 

To prove this, note that n — r is the number of free unknowns 
in system (1); let the unknowns 2,41, 242, -..-, Xn be free. We 
consider an arbitrary nonzero determinant d of order n — r, which 
we write as follows: 


ĉi, r+i) C1, r+2 e.. Cin 
d = Co, r+is Co, r429 >» +s Con 
Cnr, r44) Cner, r42) * + +> Cnr, n 


Taking elements of the ith row of this determinant, 4 Si <n —r, 
for the values of the free unknowns, we get unique values for the 
unknowns 21, 2, - - -, £r In other words, we arrive at a quite defi- 
nite solution of the system (1) of equations. Let us write the solution 
in the form of a vector: 


a, = (Cit, Cian - + +o Cire Ci, rtis Ci, rtoar «+ os Cin) 


The set of vectors 4, Œg, ---; @n-p that we have obtained 
serves as a fundamental system of solutions for the system (1) of 
equations. Indeed, this set of vectors is linearly independent since 
the matrix made up of them (as rows) contains a nonzero minor d 
of order n — r. On the other hand, let 


6 aa (b,, be, eo 89 br, betis by +0 Seely ba) 
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be an arbitrary solution of system (1). We will prove that the vector B 
can be expressed linearly in terms of the vectors a1, @, ..., Gn -r 

Denote by aj, i = 1, 2, ..., n — r, the ith row of the determi- 
nant d; regard this row as an (n — r)-dimensional vector. Then set 


p’ Fa (br 4-4, br 45 e.a bn) 
The vectors aj, i = 1, 2,..., n—r, are linearly independent 
since d 40. However, the system of (n — r)-dimensional vectors 


, , D , 
sE EEEE Ona >. 


is linearly dependent since the number of vectors in it is greater than 
their dimensionality. Hence there are scalars ki, ka, . . ., kn-r such 


that 
P = ka + ka +... + Any, (4) 
Now consider the n-dimensional vector 
ô = kyo, + kalo + .. + VEE s AEEA — B 


Since the vector 6 is a linear combination of the solutions of the 
system (1) of homogeneous equations, it will be a solution of the 
system. From (4) it follows that in the ô solution the values of 
all the free unknowns are zero. However, the unique solution of 
system (1) which is obtained for zero values of the free unknowns 
will be a trivial solution. Thus, 6 = 0, that is, 


B = kiai + kaa +... + kn-răn-r 


which proves the theorem. 

Note that the foregoing proof permits us to assert that we will 
obtain all the fundamental systems of solutions of the system (1) 
of homogeneous equations by taking for d all possible nonzero deter- 
minants of order n — r. 


Example. Given the following system of homogeneous linear equations: 
3zı t zg — 823+ 2r, + z= 0, 
224 — 225 -— 323 — Tzi + 225 = 0, 
ay + (12. — 1223 + 34r; — 52; = 0, | 
xı — 5r + 2273 — 162, + 32, = 0 
The rank of the coefficient matrix is two, the number of unknowns is equal 
to five; therefore every fundamental system of solutions of this system of equa- 
tions consists of three solutions. We solve the system confining ourselves to the 


first two linearly ener equations and considering x3, z4, zs às free un- 
knowns. We obtain the general solution in the form 


19 3 4 
au=z ntg T4 — -7 T5» 


7 25 t 
t= t3 — g uty Ts 


Then we take the next three linearly independent three-dimensional vectors 
(1, 0, 0), (0, 4, 0), (0, 0, 1). Substituting the components of each of them 
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into the general solution as values for the free unknowns and computing the 
values for z; and z2, we get the following fundamental system oi solutions 
of the given system of equations: 


a=(F 1 1, 0, 0) ’ Q= (= 2, 0, 1, 0) ’ 


ay=(—F 50.50) 1) 


We conclude this section by considering the relationship between 


we have a system of nonhomogeneous linear equations 
auti + ta +... + Aint =b, Yo | 
a11 + lagt + . . >» + Aon Tn = be, (5) 


asti + asta +... + amin = bs 
The system of homogeneous linear equations 
ati + Aigte +... + Aint, = 0, 
QoL + AgeXy, + oe >» + Aan Tn a 0, (6) 
aati + asata +... + Agnt, = 0 
obtained from (5) by replacing the constant terms by zeros is called 
the reduced system of (5). There is a close connection between the 
solutions of (5) and (6), as the following’ two theorems indicate. 
I. The sum of any solution of system (5) and any solution of the 
reduced system (6) is again a solution of system (5). 
Indeed, let c3, C3, . . ., Cn bea solution of (5), and di, da, ..., dn 
a solution of (6). Take any one of the equations of system (5), say 


the kth, and substitute into it the numbers c, + dy, Ca + dzs... 
. +» Cn +d, in place ‘of the unknowns. We get - 


à arj (cj +d;) = È arjej + È Ay jj = br +0 = br 
j= j= j= 


II. The difference between any two solutions of (5) is a solution of (6). 
Indeed, let cy, cy, ..., Cn and ci, C}, ..., Cn be solutions of 

system (5). Take any one of the equations of (6), say the kth, and 

substitute into it in place of the unknowns the numbers 


t 


r ‘ 
C1 — Ci Ca — Cy, oe 8g Cn — Cy, 


This yields 


n n n 
py arj (cj — ci) = 2 An je} — D Ap jj = br — br = 0 
i= j= j= 


It follows from these theorems that by finding one solution of the 
system (5) of nonhomogeneous linear equations and adding it to every 
solution of the reduced system (6), we obtain all solutions of (5). 


CHAPTER 3 


THE ALGEBRA 
OF MATRICES 


13. Matrix Multiplication 


In the preceding chapters the concept of a matrix was utilized 
as an essential auxiliary tool in the study of systems of linear equa- 
tions. Numerous other applications have made it the subject of a 
large independent theory, many branches of which go beyond the 
limits of this course. We shall now discuss the fundamentals of this 
theory which starts with the fact that two algebraic operations, 
addition and multiplication, are defined in the set of all square matri- 
ces of a given order in a very peculiar but fully motivated fashion. 
We begin with the multiplication of matrices; addition will be intro- 
duced in Sec. 15. 

From the course of analytic geometry we know that when the 
axes of a rectangular coordinate system in the plane are rotated 
through an angle a, the coordinates of a point are transformed accord- 
ing to the following formulas: 


x = x cosa — y’ sina, 


z’ sina + y' cosa 


il 


y 


where z and y are the old coordinates of the point, and z’, y’ are the 
new coordinates. Thus, z.and y are expressed linearly in terms of 
x’ and y’ with certain numerical coefficients. There are also many 
other instances of the substitution of unknowns (or variables) in 
which the old unknowns are linearly expressed in terms of the new 
ones. Such a substitution of unknowns is ordinarily called a linear 
transformation (or linear substitution). We thus arrive at the follow- 
ing definition. ee 
A linear transformation of unknowns is a transition from a 
of n unknowns 2, Z3, ..., Zn toa set of n unknowns Yv Yay say 
such that the old unknowns are expressed linearly in terms of tho 
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new ones with certain numerical cogigients: 
ti S anyi + agya + -~ . + amyn, 
Eq = ayyi + Agog +--+ F amyn (1) 


In = Anyi + anaya F a QnnYn 
The linear transformation (1) is fully determined by its coef- 
ficient matrix 


Aii Aia + - Ain 
A Z Qo Aog o 8 AJ lon 
ani ang . > > ann 


since two linear transformations of the same matrix can differ only 
in the letters denoting the unknowns; we take it, however, that the 
choice of these designations is wholly in our own ‘hands, Conversely, 
specifying an-arbitrary matrix of order n, we can immediately write 
the linear transformation for which this matrix serves as the coef- 
ficient matrix. Thus, there is a one-to-one correspondence between 
the linear transformations of n unknowns and the square matrices 
of order n. Therefore, every concept involving linear transformations 
and every property of these transformations must be associated with 
a similar concept or property involving matrices. 

Let us examine the problem of a successive performance of two 
linear transformations. Suppose that following the linear transfor- 
mation (41) there is effected a linear transformation like 


yı a 4124 + Di225 + eae + bin2ns 
Yo baiza + Oge%y + ~~~ + bonn, (2) 


Yn = bni + Onete +... + + Onntn 

which takes the set of unknowns yi, Yo, .--; Yn into the set 
Zis Ža, - ++, Zn, denote the matrix of this transformation by B. 
Substituting into (1) the expressions for yi, Y3, .- +, Yn from (2), 
we get linear expressions for the unknowns x, £a, ..., Zp in terms 
of the unknowns %, Za, ..., Zn- Thus, the result of a successive 
execution of two linear transformations of unknowns will again be 
a linear transformation. 


Example. The result of the successive performance of linear transformations 
t = 3y — yr n= at z, 
t2 = y+ y y= 44 + 2z 
is the linear transformation 
z = 3 (z + 3) — (44 + 22) = —y + 22 
z= (z4 + 2) + 5 (4z + 2z) = iz, + 1122 © 
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-Denote by C: the matrix of the linear transformation which 
is the result of the successive performance of transformations (1) 
and (2) and find the law by which its elements ci}, i, k=1, 2, ..., n 
are expressed in terms of the elements of the matrices A and B. 
Writing down the transformations (1) and K succinctly in the form 


= $ an, Yj i=1, 2, wary n; “w= Ý bat, j=, 2, 


we Pare 


= 2141s (È tra) = 2 (2 0:30 jn) Zh, i={1, 2, ..., n 


Thus, thee a of z, in ‘the ee presion for z; (that is, ue element 
cip of matrix C) is of the form . 


Cik = > ij jn = libar + izbor +... + ainbnr (3) 
= | 


The element of matrix C in the ith row and kth column is equal to 
the sum of the products of the corresponding elements of the ith row 
of matrix A and the kth column of matriz B. 

Formula (3), which expresses the elements of matrix C in terms 
of the elements of matrices A and B, permits us to write down C 
immediately, given A and B, without having to examine the linear 
transformations corresponding to the matrices A and B. In this 
fashion, a one-to-one correspondence is set up between any pair 
of square matrices of order n and a definite third matrix. We can 
say that in the set of all square matrices of nth order we have 
defined an algebraic operation which is called matrix multiplica- 
tion, and matrix C is called the product of the matrix A by the 
matrix B: 

C = AB 

-Let us once again formulate the relationship between linear 
transformations and matrix multiplication. 

' A linear transformation. of unknowns obtained as a result of the 
successive performance of two linear transformations of matrices A and 
B has as its coefficient matrix the matrix AB. 

Examples. 
49 4 —3\ _ 4A4+9(—2) 4—3) + 
o (1a) (271) = (cna sia yea) + 34) 


4 S) 


= (7# 

A- =g: —6 1 3 

(2) (+ 23). 0 21) a 6 2 | 
4-15 0 —13 12 —3 14) 


Oe (Fa) = (04) (2a) = (3's) 
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(4) Find the rosul of is’ successive performance of the Tintar tianetorma; 
tions. 
a= 5y — ya + Bys, fe: ra 
t= ys — 2y2, . 
. T3 = Ty2 — Ys. 


and 
ay = 234 + 23, 
y= Z2 — 523, 
Ys = 2z 


Multiplying the matrices, we obtain 


5—1 3 20 1 10 5 40 
(i —2 0)-(0 1 -5) = ( 2 —2 14) 
0 7-1 02 0 0 5 —35/ 


The desired linear transformation is therefore of the form 
a, = 102, + 5ze + 1023, 

224 — 2z + 1125, 

z3 = S Bzg — 3523 


a2 


Take one of the above examples of matrix multiplication, 
say (2), and find the product of the same matrices, but in reverse 


order: 
=3 40\ / 2 01 —8 3—1 
0 21ļ|.{—2 32ļ|= 0 5 9: 
0 —1 3 4—15 14. —6 13 


We see that the product of the matrices depends on the order 
of the factors; in other words, matrix multiplication is noncommuta- 
tive. Actually, this is something we should have expected, if only 
because the matrices A and B are not of equal status in the 
definition of matrix C given above by means of formula (3): in A 
we take the rows and in B the columns. 

Examples of noncommutative matrices of order n, that is, matri- 
ces whose product changes with an interchange of the factors, may 
be given for all n beyond n = 1 [second-order matrices in Example | 
(4) are noncommutative]. On the other hand, two given matrices 
may accidentally turn out to be commutative, as witness ihe follow- 
ing example: 


7 —12\ (26 45 26 45 7—12\) 23 
(_, z): 3) = a 26) \—4 7 a 2 
Matrix multiplication is ‘associative; one can therefore speak of 
a uniquely defined product ọf any finite number of matrices of 
order n taken in a definite order (because of the noncommutativity 
of multiplication). 


Proof. Suppose we have three arbitrary matrices of order n, A, 
B and C. In abbreviated notation (which indicates the general 
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aspect of their elements) we have A = (a;ı;), B = (bij), C = (c; i). 
We also introduce the abe: designations: 


AB = U = (vj)j), BC = V = (vij), 
(AB) C = . = (s) A (BC) = T = (t) 


We have to prove the truth of the ognationg m C=A wee that 
is, S = T. However 


ún= 2 QinOni, UVkj= $ ae 


and, therefore, because i the equations S = UC, T = AV, 


n n n 
Sij= dy Uncy = Dd) Dd) dirbk, 
151 <1 1 
n n nr 
fij = 2 GinVaj = 2 2 QinDnrC2; 
=i =14 l= 


That is to say, sı; = t; for i,j =1, 2, ..., n 

To go deeper into the properties of matrix multiplication we 
have to study their determinants. For the sake of brevity, we agree 
to denote the determinant of matrix A by | A |. If in each of the 
above examples the reader will take the pains to count the deter- 
minants of the matrices being multiplied and to compare the product 
of these determinants with the determinant of the product of the 
given matrices, he will detect an extremely curious regularity 
which is expressed as the following very important theorem on the 
multiplication of determinants. 

The determinant of a product of several matrices of order n is equal 
to the product of the determinants of these matrices. 

It will suffice to prove this theorem for the case of two matrices. 
Let there be given the mth-order matrices A = (a,;) and B = (b;;) 
and let AB = C = (¢;)). Construct the following auxiliary deter- 
minant A of order 2n: in the upper left corner put matrix A, in the 
lower right corner, matrix B, the entire upper right corner will be 
occupied by zeros, finally, put the number —1 along the principal 
diagonal of the lower left corner and zeros elsewhere. Determinant 
A will then look like this: 


ü Ang an O O 0 

Qo, Aap, Qo, O 0 0 

A| m m ann Q O 0 
—[—41 0 O bu bg bin 
0 = 4 0 Dos bos Bon 
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Applying the Laplace theorem to the determinant A—expansion 
about the first n HA get the -following equation: 


= pale Be | (4) 
Now let us attempt to transform the determinant A, without 
changing its value, so that all elements by, i, j = 4, 2, rae OF 


are replaced by zeros. To do this, add to the (n + 4)th column 
of A its first column multiplied by b,,, the second multiplied by b., 
and so on, and finally, the nth column, multiplied by bni. Then 
add to the (n + 2)th column of determinant A the first column 
multiplied by bis, the second multiplied by b,,, and so on. Gene- 
rally, we add to the (n + j)th column of the determinant A, where 
j =1, 2, ..., m, the sum of the first n columns taken, respectively, 
with the coefficients biy, Daj, .- Ong. 

It is easy to see that these manipulations do not change the 
determinant and actually result in the replacement of all ele- 
ments b;; by zeros. At the same time, in place of the zeros in the 
upper right corner of the determinant there appear the following. 
numbers: at the intersection of the ith row and the (n -+ j)th column 
of the determinant, i, j = 1, 2, ..., n, will stand the number 
aibi + Gigbgy +... + Qinbnj equal [because of (3)] to the element 
cij of matrix C = AB. The upper right corner of the determinant 
is now occupied by matrix C: 


au Ayn -> Qin Cyu Cig +e Cin 

Gz, Qa +++ Qan C Cop +++ Can 

A ani lng ünn Cni Cna Cnn 
— í 0 0 0 0 0 

0 —1 0 0 0 0 

0 0 —1 0 0 0 


Apply the Laplace theorem once again, expanding the deter- 
minant about the last n columns. The complementary minor of 
the minor | C | is equal to (—1)", and since the minor | C | is located 
in rows with position numbers 1, 2, ..., n and in columns with 
position numbers n + 1, n+ 2, ..., 2n, and 


4+2+...4¢n4¢ (nti) + nt) +... 4+ 2n = Qn +n 
it follows that 
A=(—1)?**" (—1)"7|C] =(—1) 7%" jc] 


or, due to the evenness of the number 2 (n? + n), 
= |C | (5) 
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. Finally, from (4) and (5) follows the equation we set out to prove: 
ICIS ]ATIB] 


The multiplication theorem for determinants could be proved 
without invoking the Laplace theorem. One such proof is given 
at the end of Sec. 16. . 


14. Inverse Matrices 


A square matrix is called singular if its determinant is zero, 
otherwise it is nonsingular. Accordingly, a linear transformation 
of unknowns is called singular or nonsingular depending on whether 
the coefficient determinant of this transformation is zero or not. 
The following assertion follows from the theorem proved at the 
end of Sec. 13. 

The product of matrices, at least one of which is singular, is a singu- 
lar matriz. 

The product of any nonsingular matrices is a nonsingular matriz. 

From this there follows the assertion (because of the relationship 
existing between matrix multiplication and the successive perfor- 
. mance of linear transformations): the result of a successive perfor- 
mance of several linear transformations is a nonsingular transforma- 
tion if and only if all the given transformations are nonsingular. 

The role of unity in matrix multiplication is played by the unit 
(identity) matrix 


10... 0 
ed OL 220 
00...1 


It commutes with any matrix A of a given order, 

| | AE=EA=A (1) 
These equalities are proved either by direct application of the 
rule for multiplying matrices or on the basis of the remark that the 


unit (identity) matrix corresponds to an identical linear transfor- 
mation of unknowns: 


zı = Yi, 


Tz; = Yoo 


In = Yn 


the performance of which, either prior tò or following any other 
linear transformation, obviously does not alter that transformation. 
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Note that matrix E is the only matrix which satisfies condition (4) 
for any matrix A. Indeed, if there were also matrix E’ with this 
property, we would have 

FE'E = E', ESE=E- 
whence E’ = E. l 

The question of whether a given matrix A has an inverse turns 
out to be more complicated. Since matrix multiplication is not 
commutative, we will now speak of the right inverse matrix, that 
is a matrix A~! such that postmultiplication of A by this matrix 
yields the identity matrix: 


AAX=E | : (2) 


Suppose matrix A is singular; then if matrix A~! existed, the product 
on the left of (2) would, as we know, be a singular matrix, whereas 
in actual fact the matrix Æ in the right member of this equation 
is nonsingular since its determinant is equal to unity. Thus a singular 
matrix cannot have a right inverse matrix. Similar reasoning shows 
that it cannot have a left inverse matrix either, and for this 
reason, a singular matrix has no inverse at all. 

Passing to the case of a nonsingular matrix, let us first introduce 
the following auxiliary concept. Suppose we have an nth-order 
matrix 


Aii Qizg «+» + Ain 
A = Qo da2 . . 6 Qon 
Qni n2 eo me Ann 
The matrix 
Ay, Ag... Ant 


Aig Ags eee Ane 
Ais Am ARNA E 


which consists of the cofactors of the elements of A (note that the 
cofactor of element a;; lies at the intersection of the jth row and 
the ith column) is called the adjoint of matrix A. 

Let us find the products AA* and A*A. Using the familiar for- 
mula (see Sec. 6) for the expansion of a determinant about a row 
or column, and also the theorem (see Sec. 7) on the sum of the pro- 
ducts of the elements of any row (column) of a determinant by the 
cofactors of the corresponding elements of another row (column) 
and denoting by d the determinant of the matrix A, 


d= |A| 
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we get the following equations: l 
d0... 0 


AA* = A*A = E ETTE.) (3) 
00...d 


From this it follows that if matriz A is nonsingular, then its 
adjoint A* will also be nonsingular; note that the determinant d* 
of matriz A* is equal to the (n — 1)th power of the determinant d of 
matriz A. 

Indeed, passing from (3) to the equality between the determi- 
nants, we get 

dd* = qd" 


whence, because d 0, 
d* = q®-1 


(We could prove that if matrix A is singular, then its adjoint 
A* is also singular and has rank which does not exceed 1.) 

It is now easy to prove the existence of an inverse matrix for 
any nonsingular matrix A and to find its form. Note first that if 
we consider the product of two matrices AB and if we divide all 
the elements of one of the factors, say B, by one and the same 
number d, then this number also divides all elements of the product 
: AB: to prove this all we need to do is recall the definition of matrix 
multiplication. Thus, if 

d = | A | +0 


then from (3) it follows that the inverse of A is a matrix obtained from 
the adjoint A* by means of division of all its elements by the number d: 


Ay Az; Ant 
d . d LJ . e d 
Ai Az Ane 
At=|"a@ “a d 
Ain Aon Ann 
d d . e e d 


Indeed, from (3) follow the equalities 
AA“! = AA =E (4) 


We stress once again that the ith row of matrix A~! contains 
the cofactors of the elements of the ith column of determinant | A | 
divided by d= |A |. 

It is easy to prove that matrix Aq is the only matrix which 
satisfies condition (4) for a given nonsingular matriz A. True enough, 
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if matrix C is such that . 
AC=CA=E 
then . l 
CAA™ = C (AA) = CE =C, 
CAA = (CA) A~! = EA! = A~! 
whence C = A™, | | 

From (4) and the multiplication theorem for determinants it 
follows that the determinant of matriz A`! is equal to on so that 


this matrix is also nonsingular; its inverse is the matrix A. 

Now, if we have square matrices A and B of order n, of which A 
is nonsingular and B is arbitrary, then we can perform the right 
and left divisions of B by A, that is, we can solve the matrix equations 

AX =B, YA =B l = (5) 

To do this, it will suffice (because of the associativity of matrix 

multiplication) to set _ 

X =A-B, Y= BA- 
These solutions of equations (5) will, in the general case (because 
matrix multiplication is not commutative), be distinct. 


Example 1. Given a matrix 


3 —1 0 
2—14 
Its determinant | A |= 5, and so the inverse matrix A-1 exists 
4 1 
1a S 
12 3 
-1 = ee eee 
Om pAg g 
4 1 
ee. a 


Example 2. Given the matrices i 
_ (3 2 fat 
a= (73): B= ( 3 5) 
The matrix A is nonsingular, and 
3 —2 
-1 
4 (3 3) 


Therefore the following matrices are solutions of the equations AX = B, | 


E eet gr eee 


eri ies 


14. INVERSE MATRICES 97 


Multiplication of rectangular matrices. Although in the preceding 
section matrix multiplication was only defined for square matrices 
of the same order, it carries over to the case of rectangular matrices A 
and B, provided it is possible to apply formula (3) of the preceding 
section, i.e., if any row of matrix A contains as many elements 
as there are in any column of matrix B. In other words, one can 
speak of the product of rectangular matrices A and B if the number 
of columns of matrix A is equal to the number of rows of matrix B, 
the number of rows of matrix AB being equal to the number of rows 
of A, and the number of columns of matrix AB to the number of columns 
of B. 


Examples. 
—13 0 
5-1 314 —24 1 10 15 —5 
(1) (> ee 30 —2 = (i1 10 io) 
44i 2 
0—3 í 3 8 
(2) ( 2 1 s):(-)-{ 14) 
—4 0—2 2 —16 
2 0 
(3) (5 1 0 —3). > ~t) = (11 —1) 
\0 —4 


Multiplication of rectangular matrices may be related to a 
successive performance of linear transformations of the unknowns, 
provided only that in the definition of the latter we give up the 
assumption that the number of unknowns is preserved under the 
linear transformation. 

It is also easy to verify, by repeating word for word the proof 
given above for the case of square matrices, that associativity holds 
true also for the multiplication of rectangular matrices. 

We now take advantage of the multiplication of rectangular 
matrices and of properties of the inverse matrix for a new deriva- 
tion of Cramer’s rule, which does not require the involved compu- 
tations that were carried out in Sec. 7. Let there be given a system 
of n linear equations in n unknowns: 


aTi + AioLo + >.. o + Qnty = bis 

Aati + Agata + sae + aonn = be, (6) 
An 1X, + Analy T+... + annn = bn 

The determinant of this system is different from zero. Denote by A 

the coefficient matrix of system (6); this matrix is nonsingular 


7—5760 
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since, by hypothesis, d = | A | 0. Denote by X the column of | 
unknowns, by B the column of constant terms of (6); thus 


Tı by 

x b 
A= ? B= 2 

Tn bn 


The product AX is meaningful since the number of columns of 
matrix A is equal to the number of rows of matrix X; this product 
will be a column composed of the left-hand sides of the equations 
of system (6). Thus, (6) may be written in the form ofa single matrix ` 


equation 
AX =B (7) 


Multiplying both sides of (7) on the left by the matrix A-1, the 
existence of which follows from the nonsingular nature of the square 


matrix A, we get 
X = AB (8) 


The product on the right is a matrix of one column; its jth 
element is equal to the sum of the products of the elements of the 
jth row of matrix A~! by the corresponding elements of matrix B, 
that is, it is equal to the number 


A ate A 
Ft AP at oe FTE On = g (Aub + sida + «+ + Angbn) 


The parenthesis on the right is, ee an expansion about the 
jth column of determinant d;, which is obtained by replacing the 
jth column of d by the column B. Thus, formulas (8) are equivalent 
to formulas (3), Sec. 7, which express the solution obtained by 
Cramer’s rule to system (6). 

It remains to show that the values of the unknowns thus obtained 
are indeed the solution of system (6). To do this, put expression (8) 
into the matrix equation (7); it obviously yields the identity B = B. 

The rank of a product of matrices. In the case of singular matrices, 
the multiplication theorem for determinants does not lead to any 
utterance beyond the fact that their product will also be’ singular, 
although singular square matrices can be distinguished according 
to rank as well. Note that there is no completely definite relation- 
ship between the ranks of the factors and the rank of the product, 
as is evident from a glance at the following examples: 


(6.0)-(0 0) (00): 
(o o): (o 3) =(00) 
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In both cases, matrices of rank 1 are multiplied, but in one case 
the product has rank 1, in the other, rank 0. It is only the following 
theorem which holds true (and not only for square but for rectangular 
matrices as well). 

The rank of a product of matrices does not exceed the rank of each 
of the factors. 

It will suffice to prove this theorem for the case of two factors. 
Suppose we have matrices A and B for which the product AB is 
meaningful: AB = C. We consider formula (8), Sec. 13, which 
yields an expression for the elements of matrix C. Taking this 
formula for the given k and for all possible i (i = 4, 2, J 
we find that the kth column of matrix C is the sum of all the columns 
of matrix A taken with certain coefficients (namely, with the coef- 
ficients big, bar, ...). This is proof that the system of columns of 
matrix C is expressed linearly in terms of the system of columns 
of matrix A and, therefore, as shown in Sec. 9, the rank of the first 
system is less than or equal to the rank of the second system; in 
other words, the rank of matrix C does not exceed the rank of matrix 
A. On the other hand, since from this same formula (8), Sec. 13, 
there follows, for a given i and for all k, that each ith row of matrix 
C isa linear combination of the rows of matrix B, we find by analo- 
gous reasoning that the rank of C is not greater than the rank of B. 

A. more precise result is obtained in the case when one of the 
factors is a nonsingular square matrix. 

The rank of the product obtained by pre- or postmultiplication 
of an arbitrary matrix A by a nonsingular square matrix Q is equal 
to the rank of matrix A. 

For example, suppose 


AQ=C (9) 


From the preceding theorem it follows that the rank of matrix C 
is not greater than the rank of matrix A. However, multiplying (9) 
on the right by Q-!, we arrive at the equation 


A = CQ-} 
and for this reason, again on the basis of the preceding theorem, 


the rank of A does not exceed that of C. A comparison of these two 
results proves the coincidence of the ranks of matrices A and C. 


15. Matrix Addition and Multiplication 
„of a Matrix by a Scalar 


For square matrices of order n, addition is defined as follows. 
The sum A + B of two square matrices A = (a;;) and B = (b;;) 
of order n is the matrix C = (cj), each element of which is equal 


7* 
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to the sum of the corresponding elements of matrices A and B: 
Ciy = aij + bij" 

The addition of matrices thus defined will obviously be 
commutative and associative. The inverse operation also exists; 
the difference between the matrices A and B is a matrix composed 
of the differences of the corresponding elements of the given matrices. 
The role of zero is played by the zero matrix, composed entirely 
of zeros; this matrix will from now on be denoted by the symbol 0. 
There is no real danger of confusing a zero matrix and the number 
zero. 

The addition of square matrices and their multiplication as defined 
in Sec. 13 are related by the distributive laws. 

Indeed, suppose we have three matrices of order n, A = (a;;), 
B = (b;;), C = (cj). Then for any i and j we have the obvious 
equality 


2 (dis + bie) Csj = 2 QisCsj + 2 bisCs; 


However the left side of this equation is an element in the ith row 
and jth column of the matrix (A + B) C, the right side is an element 
in the same position in the matrix AC + BC. This proves the 
equation 

(A+ B)C =AC+ BC 


The equation C (A + B) = CA + CB is proved in exactly the 
same way: the noncommutativity of matrix multiplication quite 
naturally requires proof of these two distributive laws. 

Let us introduce the following definition of multiplication of 
matrices by a scalar. 

The product kA of a square matrix A = (a;;) by a scalar k is the 
matrix A’ = (a’;;) obtained by multiplying all elements of the 
matrix A by k: 

aij = kaij 


We have already encountered (Sec. 14) one such example of 
multiplication of a matrix by a scalar: if matrix A is nonsingular, 
and | A | = d, then its inverse, A-t, and the adjoint A* are connect- 
ed by the equation 

A- = d-14* 


As we know, any square matrix of order n may be regarded 
as an n-dimensional vector: this correspondence between matrices 


+ Of course, one could define the matrix product in just as natural a way 
multiplying the corresponding elements. However, such multiplication, 
unlike that defined in Sec. 13, would not find any serious applications. 
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and vectors is one-to-one. The addition of matrices and the multi- 
plication of a matrix by a scalar defined here are then converted 
into the addition of vectors and the multiplication of a vector 
by a scalar. Thus, the collection of square matrices of order n may be 
regarded as an n-dimensional vector space. 

From this follows the truth of the following equations (here, 
A and B are matrices of order n; k, lare scalars and 1 is the number 
unity): 


k (A + B) = kA + kB, (1) 
(k +1) A =kA + lA, (2) 
ie (LA) = (kd) A, (3) 

1:4 =A (4) 


Properties (1) and (2) connect multiplication of a matrix by 
a scalar with addition of matrices. At the same time, there is a very 
important relationship between the multiplication of a matrix by 
a scalar and multiplication of the matrices alone, namely, 


(kA) B = A (kB) = k (AB) (5) 


In words, if one of the factors in a product of matrices is multiplied 
by a scalar k, then the whole product is multiplied by k. 

Let there be matrices A = (a,;;) and B = (b;;) and a scalar k. 
Then for any i and j, 


> (kais) bsj =k >) aishs; 
s=1 s==1 


The left side of this equation, however, is an element in the ith 
row and the jth column of matrix (kA) B, the right side is an element 
in the same place in matrix k (AB). This proves the equation 


(kA) B = k (AB) 


The equation A (kB) = k (AB) is proved in the same way. 

- The operation of multiplication of a matrix by a scalar permits 
introducing a new mode of matrix notation. Denote by £,; the 
matrix in which unity lies at the intersection of the ith row and 
the jth column, all other elements being zero. Setting i = 1, 2, .. 
..., n, and j = 14, 2, ..., n, we obtain n? such matrices Ejj, 
which are connected, as may easily be verified, by the following 
multiplication table: 


E; Es; =o Ei}, Ei; Et; = 0 for s At 


The matrix. kE;; differs from the matrix E;; solely in the fact 
that it has the scalar k at the intersection of the ith row and the 
jth column. Taking this into consideration and using the definition 
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of matrix addition, we get the following notation for an arbitrary 
square matrix A: 


Q11 aiz eae Qin 
n n i 
Qo4 A22, o@e don 
A= | =>) D>) ei Fis (6) 
i=1 j=1 
Ani Ong .-+» Ann 


The matrix A obviously possesses only the notation (6). 
The matrix kE, where Æ is the unit matrix, has, by the defni- 
tion of multiplication of a matrix by a scalar, the following form: 


k 0° 


0 k 
that is to say, one and the same scalar k on the principal diagonal 


and all other elements zero. Such matrices are called scalar matrices. 
The definition of matrix addition leads to the equation 


-KE+IES=(kK+ DE (7) 


On the other hand, using the definition of matrix multiplication 
or proceeding from (5), we get 


kE-lE = (kl) E (8) 


Multiplication of matrix A by a scalar k may be interpreted as 
multiplication of A by a scalar matrix kE in the meaning of multi- 
plication of matrices. Indeed, by (5), 


(kE) A = A (kE) = kA 


The conclusion to be drawn here is that every scalar matrix 
commutes with any matrix A. It is very important to point out that 
scalar matrices are the only ones with this property. 

If a matrix C = (c;;) of order n commutes with any matriz of the 
same order, then C is a scalar matriz. 

Indeed, set i =j and consider the products CH,; and E;,;C 
(which by hypothesis are equal; see above definition of the matrix 
E;;). It is clear that all columns of matrix CE;;, except the jth, 
consist of zeros, and the jth column coincides with the ith column 
of matrix C; in particular, element c;; lies at the intersection of 
the ith row and the jth column of matrix CE,;. Similarly all the 
rows of matrix E;;C, except the ith, consist of zeros, and the ith 
row coincides with the jth row of matrix C; at the intersection of 
the ith row and the jth column of matrix E;; C lies the element c;;. 
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Using the equality CE;; = E;;C, we find that c;; = cj; (as elements 
in the same positions of equal matrices), which is to say that all 
elements of the principal diagonal of matrix C are equal. On the 
other hand, element c; lies at the intersection of the jth row and 
the ith column of matrix CZ;;; but in matrix E£,;C we have a zero 
at this site (because i = j), and therefore cj; = 0, or every off-diago- 
nal element of matrix C is zero. The theorem is proved. . 


16. An Axiomatic Construction 
of the. Theory of Determinants 


An nth-order determinant is a number which is uniquely defined 
by a given square matrix of order n. The definition of this concept 
given in Sec. 4 points to a rule by which a determinant can be 
expressed in terms of the elements of the given matrix. This construc- 
tive definition may, however, be replaced by an axiomatic definition. 
In other words, it is possible to point out, among the properties 
of a determinant that were established in Secs. 4 and 6, such > " roper- 
ties that the determinant is the sole function of a real matrix aving 
these properties. 

The simplest definition of this kind consists in utilizing the 
expansion of a determinant in terms of a row. Let us consider square 
matrices of any order and let us assume that any such matrix M 
is associated with a number dm and the. following conditions hold. 

(1) If the matrix M is of order one, that is, if it consists of 
a single element a, then dy = a. 

(2) If the first row of a matrix M of order n is made up of the 
Elements G41, Qiz <- - Qin and if M;, i =1, 2,..., n, denotes 
a matrix of order n — 4 which remains after deleting from M the 
first row and the ith column, then 


d y = adm, a iadM + a43dy, ae et (—1)""1a1,du, 


Then for any matriz M, the number dy is equal to the determinant 
of that matrix. We leave it to the reader to carry out the proof of 
this assertion, which is done by induction with respect to n and 
utilizes the results of Sec. 6. 

Much more interesting are some other forms of an axiomatic 
definition of a determinant which refer solely to the case of a given 
order n and have for a basis some of the simplest determinant pro- 
perties that were established in Sec. 4. Let us examine one of these 
definitions. 

Let any square matrix M of order n be associated with a number 
dy, and let the following conditions hold true. 

I. If one of the rows of matrix M is a multiple of k, then the number 
dy is also a multiple of k. 


104 CH. 3. THE ALGEBRA OF MATRICES 


— 


Il. The number dy is not changed if to one of the rows of M we 
add another row of this matriz. 

Ill. If E is the unit matriz, then dg = 1. 

We shall prove that for any matrix M the number dm is equal 
to the determinant of the matrix. 

Let us first derive from the conditions I to III certain properties 
of the number dy that are analogous to the corresponding properties 
of a determinant. 

(1) If one of the rows of matrix M consists of zeros, then dm = Q. 

Indeed, by multiplying a row consisting of zeros by the number 0, 
we do not change the matrix, but because of Condition I, the number 
dm acquires the factor 0. Therefore 


(2) The number dy does not change if to the ith row of matrix M 
we add its jth row, j = i, multiplied by a scalar k. 

If k = 0, then that is the proof. If k 0, then we multiply 
the jth row by k and obtain a matrix M’ for which, because of 
Condition I, dm = kdm. Then to the ith row of matrix M’ we 
add the jth row and obtain the matrix M”, and, because of Condi- 
tion IT, dye = dy. Finally, we multiply the jth row of matrix M” 
by the scalar k. We arrive at matrix M”, which is actually 
obtained from M by the transformation indicated in the formula- 
tion of the property being proved; note that 


dmm = kidy” = kdy’ = i}. kdm = 


(3) If the rows of matrix M are linearly dependent, then dy = Q. 

Indeed, if one of the rows, say the ith, is a linear combination 
of the other rows, then, applying transformation (2) several times, 
it is possible to replace the ith row by a row of zeros. Transforma- 
tion (2) does not change the: number dy and so, by Property (1), 
dy = 0. 

(4) If the ith row of matrix M is a sum of two vectors B and y and 
if matrices M’ and M” are obtained from M by replacing its ith row 
by the vectors p and y, respectively, then 


du = dur + due 


Let S be the system of all rows of matrix M, except the ith. 
If there is a linear dependence in S, then the rows of each one of 
the matrices M, M’, M” are linearly dependent, and therefore, 
by Property (3), dy = dm = dum” = 0, whence in that case 
follows the truth of the property being proved. Now if a system S 
consisting of n —1 vectors is linearly independent, then as the 
results of Sec. 9 show, a vector œ may be adjoined to form a maximal 
linearly independent system of vectors of n-dimensional vector 
space. It is possible to express the vectors ĵ and y linearly in terms 
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of this system. Let vector a enter these expressions with the coef- 
ficients k and Z, respectively; vector œ will then enter the expression 
for the vector B + y, that is, for the ith row of matrix M, with 
the coefficient k -+ 7. Matrices M, M’ and M” can now be transformed 
by subtracting from their ith rows certain linear combinations of 
other rows so that the vectors (k + 7) a, ka and la will serve respec- 
tively as their ith rows. Therefore, denoting by M° the matrix 
obtained from matrix M by replacing its ith row by the vector a 
and taking into account Properties (2) and I, we arrive at the equa- 
tions 
=(k+1)dyo, dm =kdmo, dy =ldyo 


The proof of Property (4) is complete. 

(5) If matrix M is obtained from matrix M by interchanging 
two rows, then dy = —dy. 

Suppose it is necessary in matrix M to interchange the rows 
with subscripts i and j. This can be achieved by a chain of trans- 
formations: first add to the ith row of M its jth row and get matrix 
M'; by Condition II, dm = dy. Then from the jth row of M’ sub- 
tract its ith row and arrive at the matrix M”, for which, by Property 
(2), we have dy» = dy’; the jth row of M” will differ in sign from 
the ith row of M. Now add to the ith row of M” its jth row. For 
- matrix M”, which this manipulation yields, we have, by Condi- 
tion II, dm: = dy, and the ith row of this matrix coincides with 
the jth row of matrix M. Finally, multiplying the jth row of M” 


by —1, we arrive at the desired matrix M. Therefore, by Condition J, 
dy = — dmr = —dy 


(6) If matrix M’ is obtained from matrix M by interchanging rows, 
the a;-th row of matrix M serving as the ith row of matrix M’, i = 
=1,2,..., n, then 

du: = +dy 


The plus sign corresponds to the case when the permutation 
1°22 ...n 
Oy Qe eee An 

is even; the minus sign, to the case when it is odd. 

Indeed, matrix M’ may be obtained from matrix M by a number 
of transpositions of two rows, and for this reason we can take advan- 
tage of Property (5). The parity of the number of these transposi- 
tions determines, as we know from Sec. 3, the parity of the above- 
given permutation. 

Now let us consider the matrices = (aij), N = (bi;) and 


their product Q = MN in the meaning we Sec. 13. We find the 
number dg. We know that any ith row of matrix Q is the sum of 
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all the rows of matrix N taken, respectively, with the coefficients 
Git, Gigs » + +, Gin (See, for example, Sec. 14). Replace all the rows 
of Q by their indicated linear expressions in terms of the rows of 
matrix N and take advantage of Property (4) several times. We 
find that the number dg will equal the sum of the numbers dr for 
all possible matrices 7 of the following kind: the ith row of T, 
i=1, 2, ..., n, is equal to the a, th row of matrix N multiplied 
by a scalar a;,;. Here, because of Property (3), we can disregard 
all matrices T for which there exist subscripts i and j, i Æ j, such 
that a; = æ;; in other words, what remain are only matrices T 
for which the ee. Œi, Œa, .. +, &, constitute an arrangement 
of the numbers 4, 2, ., h. Because of Properties I and (6), the 
number dr for such a "matrix is of the form 


dr = Eliata: «+: Onan 


where the sign is determined by the parity of the permutation formed 
from the subscripts. Whence we arrive at the expression for the 
number dg: after factoring the common factor dy out of all summands 
of the type dr, what we obviously have left in the parentheses is the 
determinant | M | of the matrix M in the sense of the constructive 
definition as given in Sec. 4, i.e., 


= |M |-dy (*) 


If we now take the unit matrix E for the matrix N, then Q = M, 
and, by Property III, dy = dg = 1, that is for any matrix M we 
have the equality 

dy = |M | 


which is what we set out to prove. At the same time, once again, — 
and without the use of the Laplace theorem, we have proved the 
multiplication theorem for determinants: all that needs to be done 
is, in equation (*), to replace the numbers dg and dy by the deter- 
minants of the respective matrices. 

We conclude these axiomatic considerations with proof of the 
independence of Conditions I to III, that is proof that none of 
these conditions is a consequence of the other two. . 

To prove the independence of Condition III, assume that dy = 0 
for any matrix M of order’n. Conditions I and II will obviously 
be fulfilled, but III breaks down. — 

To prove the independence of Condition II assume that for 
any matrix M the number d,, is equal to the product of the elements 
in the principal diagonal of the matrix. Conditions I and III are 
fulfilled, Condition II breaks down. 

Finally, to prove the independence of Condition I, assume that 
dy = 1 for any matrix M. Conditions IJ and III will be fulfilled 
but Condition I fails. 


CHAPTER 4 


COMPLEX NUMBERS 


17. The System of Complex Numbers 


During the course of elementary algebra the range of numbers 
is expanded several times. The beginning student of algebra brings 
with him from arithmetic a knowledge of positive integers and 
fractions. Algebra actually begins with the introduction of negative 
numbers, thus establishing the first of the important number 
systems, the system of integers, which consists of all the positive 
and all the negative integers and zero, and the broader system of 
rational numbers consisting of all integers and all fractions (both 
positive and negative). 

A further extension of the number realm is the introduction 
of the irrational numbers. The system consisting of all rational and 
all irrational numbers is the system of real numbers. A university 
course of mathematical analysis usually contains a rigorous construc- 
tion of the system of real numbers; however, for our purposes in this 
course the knowledge of the real numbers that the reader has when 
he takes up the study of higher algebra will suffice. 

Finally, at the very end of the course of elementary algebra, 
the system of real numbers is extended to the system of complex 
numbers. Of course this system of numbers is less common than the 
system of real numbers, though actually it possesses many very 
good properties. In this chapter we recapitulate with sufficient 
completeness the theory. of complex numbers. 

Complex numbers are introduced in connection with the following 
problem. We know that the real numbers do not suffice for us to 
solve every quadratic equation with real coefficients. The simplest 
of the quadratics that does not have any roots in the class of real 
numbers is 

e+t1=0 (1) 
We will only be interested in this equation for the present. The 
problem confronting us is: to extend the system of real numbers to 
a system of numbers that will supply us with a root for equation (1). 
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As construction material for this new system of numbers, let 
us take advantage of points in a plane. It will be recalled that the 
‘depicting of real numbers by points of the straight line (this is 
based on the fact that we obtain a one-to-one correspondence between 
the set of all points of the line and the set of all real numbers if, for 
a given origin of coordinates and a scale unit, every point of the 
line is associated with an abscissa) is systematically utilized in all 
divisions of mathematics and is so customary that ordinarily 
we do not make any distinction between a real number and the 
point that depicts it. l 

Thus, we wish to define a system of numbers correlated with all 
points in the plane. Up till now we have not had to add or multiply 
points of a plane, and so we can define the operations involving 
points, taking care only that the new system of numbers should 
possess all the properties intended for it. These definitions, parti- 
cularly for products, will at first appear to be rather artificial. 
In Chapter 10, it will be shown however that no other definitions 
of operations, which at first glance may seem more natural, would 
give us what we want; that is, they would not result in the construc- 
tion of an extension of the system of real numbers containing the 
root of equation (1). It will also be demonstrated there that replacing 
the points of a plane by any other material would not have led 
to a system of numbers whose algebraic properties differ from the 
system of complex numbers which we will construct below. 

We have a plane and we choose a rectangular system of coordi- 
nates. Let us agree to denote points of the plane by the letters 
a, B, y, ... and write a point œ with abscissa a and ordinate b 
as (a, b), that is, departing somewhat from what is accepted in 
analytic geometry, and write œ = (a, b). If we have points æ = (a, b) 
and B = (c, d), then the sum of these points will be a point with 
abscissa a +- c and ordinate b + d, or 


(a, b) + (c, d) = (a +c, b +d) (2) 


For the product of the points a = (a, b) and = (c, d) we will have 
the point with abscissa ac — bd and with ordinate ad + be, or 


(a, b) (c, d) = (ac — bd, ad + be) (3) 


We have thus defined two algebraic operations on the set of 
all points in the plane. We will show that these operations have 
all the basic properties possessed by operations in the system of real 
numbers or in the system of rational numbers; both are commutative 
and associative, connected by the distributive law, and have inverse 
operations—subtraction and division (except by zero). 

Commutativity and associativity of addition are obvious (more 
precisely, they follow from the corresponding properties of the 
addition of real numbers) since in the process of adding points of 
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the plane we separately add their abscissas and their ordinates. 
The commutativity of multiplication is based on the fact that the 
points a and f enter the definition of a product symmetrically. 
The following equations prove associativity of multiplication: 


[(a, b) (c, d)] (e, f) = (ac — bd, ad + be) (e, f) 
= (ace — bde — adf — bcf, acf — bdf +- ade + bce), 
(a, b) [(c, d) (e, f)] = (a, b) (ce — df, cf + de) 
= (ace — adf — bcf — bde, acf + ade + bie. — bdf) — 
The distributive law follows from the equations 
[(a, b) + (c, d)] (e, f) pa (a +c, b+ d) (e, f) 
= (ae + ce — bf — df, af + cf + be + de), 
(a, b) (e, f) + (c, d) (e, f) = (ae — bf, af + be) + (ce — df, cf + de) 
= (ae — bf + ce — df, af + be + cf + de) 
Let us examine the inverse operations. If we have the points 
a = (a, b) and P = (c, d), then their difference is a point (z, y) 
such that 
(c, d) + (x, y) = (a, b) 
Whence, by (2), 
c+zr=a, d+y=b 


Thus, the difference of the points a = (a, b) and B = (c, d) is the 
point 


a — B = (a — c, b — A (4) 


and this difference is defined in unique fashion. In particular, zero 
is the coordinate origin (0, 0); the opposite point of a = (a, b) 
is the point - 


—a = (—a, —b) (5) 


Now, suppose we have the points a = (a, b) and B = (c, d), 
and suppose point 6 is nonzero; that is, at least one of coordinates c, 
d is nonzero, and therefore, c? + d? + 0. The quotient of œ divided 
by P must be a point (z, y) such that (c, d) (x, y) = (a, b). Whence, 
by (3), 


cz — dy = a, 
dx +- cy = b 
Solving this system of equations, we obtain 
ac--bd be —ad 


t=’: Y-are 
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Thus, for Bs40 the quotient 5 exists and is unambiguously defi- 
ned : 

œ { ac--bd be —ad 

Bp ( cpd? 2J ) ©) 


Assuming B=a, we find that in our multiplication of points unity 
is a point (1, 0) lying on the axis of abscissas at a distance 1 to the 
right of the origin. Also assuming in (6) that a = 1 = (4, 0), 
we find that for 6 + 0, the inverse of 6 is 


p= (sta: apa) (n) 


We have thus constructed a system of numbers that can be depicted 
by points in the plane, and the operations on these numbers are 
defined by formulas (2) and (3). This system is called the system 
of complex numbers. 

Let us now show that the system of complex numbers is an extension 
of the system of real numbers. To do this, we consider points lying 
on the axis of abscissas, or points of the form (a, 0); associating 
a real number a with the point (a, 0), we evidently get a one-to-one 
correspondence between the set of points under consideration and 
the set of all the real numbers. Applying to these points formulas 
(2) and (3), we get 

(a, 0) + (b, 0) = (a + b, 0), 
(a, 0)-(b, 0) = (ab, 0) 


i.e., points (a, 0) may be added and multiplied in the same way 
as the corresponding real numbers. Thus, the set of points on the 
axis of abscissas, considered as a part of the system of complex numbers, 
does not differ in its algebraic properties from the system of real numbers 
as ordinarily depicted by points on a straight line. This will enable 
us, in the future, to equate the point (a, 0) and the real number a, 
i.e., we will always assume (a, 0) = a. In particular, zero (0, 0) 
and unity (4, 0) of the system of complex numbers turn out to be 
the real numbers 0 and 1. 

We now have to demonstrate that the complex numbers contain 
the root of equation (1), that is, a number whose square is equal 
to the real number —1. This is the point (0, 1), i.e., a point lying 
on the axis of ordinates at a distance 1 upwards from the origin. 
Indeed, using (3), we get 


(0, 1)-(0, 1) = (—1, 0) = —1 
Let us agree to denote this point by the letter i, so that i? = —4. 


Finally, let us show how the customary notation of the complex 
numbers we have constructed can be obtained. First find the product 
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of a real number b and the point i: _ : 
bi = (b, 0)-(0, 1) = (0, b) 


This is a point, consequently, which lies on the ordinate axis and 
has ordinate b; all points of the ordinate axis may be represented 
by such products. Now if (a, b) is an arbitrary point, then because 
of the equation 


(a, b) = (a, 0) + (0, b) 
(a, b) =a + bi 


In other words we have arrived at the customary notation of complex 
numbers; the product and sum in the expression q -+ bi are to 
be understood, of course, in the sense of operations defined in the 
system of complex numbers we have construcied. 

Now that we have constructed the complex numbers, the reader 
will have no difficulty in verifying that all the preceding chapters 
of this book—the theory of determinants, the theory of systems of 
linear equations, the theory of the linear dependence of vectors, 
and the theory of matrix operations—carry over without any restric- 
tions from real numbers to all complex numbers. 

Note, in conclusion, that the foregoing construction of the system 
of complex numbers raises the following question. Is it possible 
to define addition and multiplication of points in three-dimensional 
space so that the collection of these points becomes a system of num- 
bers containing within it the system of complex numbers or at 
least the system of real numbers? This question goes beyond the 
scope of the present text, but the answer is no. | 

On the other hand, noting that the addition of complex numbers 
as defined above actually coincides with the addition of vectors 
(in a plane) emanating from a coordinate origin (see following 
section), it is natural to pose the question: is it possible, for a cer- 
tain n, to define the multiplication of vectors in an n-dimensional 
real vector space so that, relative to this multiplication and to 
ordinary addition of vectors, our space proves to be a number system 
containing the system of real numbers? It may be demonstrated 
that this cannot be done if we require the fulfillment of all the proper- 
ties of the operations which are valid in the systems of rational, 
real and complex numbers. However, if we reject commutativity 
of multiplication, then such a construction is possible in four-dimen- 
sional space; the resulting system of numbers is called the system 
of quaternions. A similar construction is also possible in eight- 
dimensional space. This yields what is called the system: of Cayley 
numbers. In this case, however, we have to give up not only the 
commutativity of multiplication but also associativity, and replace 
the latter by a weaker requirement. . 


we get 
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18. A Deeper Look at Complex Numbers 


In keeping with historically evolved traditions, we call the 
complex number i the imaginary unit, and numbers of the form 
bi, pure imaginaries, although we have no doubt about the existence 
of such numbers and we can indicate points of the plane (points 
on the axis of ordinates) which depict these numbers. In the complex 
notation of the number a, asa = a + bi, the a is called the real part 
of a and bi is called its imaginary part. A plane with points identified 
with complex numbers as indicated in Sec. 17 is called the complex 
plane. The axis of abscissas (x-axis) is called the azis of reals since 
its points depict the real numbers, and the axis of ordinates (y-axis) 
of the complex plane is termed the azis of imaginaries. 

The addition, multiplication, subtraction and division of complex 
numbers written in the form a +- bi are performed in the following 
manner, as follows from formulas (2), (4), (3) and (6) of the preceding 


section: 
(a + bi) + (e+ diìi) = (a + c) + (b+ a)i, © 
(a + bi) — (c + di) = (a — ¢) + (b — d) i, 
(a + bi) (c + di) = (ac — bd) + (ad + bei, 


a+bi sae De we te ae be— ad i 
ctdi c24 dz apa’ 


In the addition of complex numbers, add separately the real parts and 
the imaginary parts. Similarly for subtraction. The formulas for 
multiplication and division would be too involved if given verbally. 
The last formula need not be memorized; simply bear in mind that 
it may be derived by multiplying the numerator and denominator 
of the given fraction by a number different from the denominator 
solely in the sign of the imaginary part. Indeed, 


a+bi _ (a+bi) (c—di)  (ac+-bd)-+-(be—ad)i__ ac-+-bd be —ad 
ctdi  (e-+di)(c—di) c2| = apa t pas! 
Examples. 

(1) (2+ 51) + (1 — 7) = oa EE ea nies a2 

(2) (3-9) —(7+ ) = (8 — T) + (—9—1) t= —4 — 10i. 

(3) (4 + 28) (3 — ) = [1-3 — 2 (—4)] + [4-(—4) + 2-3] t = 5 + 5i 

w ti _ (23-49 (8-1) _ 10-201 | 

) 357 BAB 10 ` 

The portrayal of complex numbers as points in a plane result 

in a natural desire to have a geometric interpretation of the opera- 

tions involving complex numbers. For addition, this interpretation 


is simple. Suppose we have the numbersa = a + biand B = c + di. 
Join the corresponding points (a, b) and (c, d) with line segments 
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to the origin and construct a parallelogram on these segments, 
as sides, as shown in Fig. 2. The fourth vertex of the parallelogram 
will obviously be the point (a + c, b + d). Thus, the addition 
of complex numbers geometrically is accomplished in accord with 
the parallelogram rule, which is to say by the rule of addition of vectors 
emanating from the coordinate origin. Also, the number opposite 
toa =a 4 bi is a point in the complex 
plane that is symmetric to a about the origin 
(Fig. 3). This gives the geometric interpre- 
tation of subtraction. 

= The geometric meaning of multiplica- 
tion and division of complex numbers will 
become clear only after we introduce a new 
notation for them that differs from that 
used heretofore. The notation of œ as 
a = a +- bi makes use of the Cartesian 
coordinates of a point corresponding to that «Fig. 2 
number. However, the position of a point l 
in the plane is also completely defined by specifying its polar coor- 
dinates: the distance of r from the origin to the point and the angle 
ọ between the positive z-axis (axis of abscissas) and the direction 
from the origin to the point (Fig. 4). 

The number r is a nonnegative real number which is zero only 

at the point 0. For œ on the real axis (that is to say, for a a real- 


Im 


Fig. 8 Fig. 4 


number), the number r is the absolute value of a; for this reason, 
for any complex number a, the number r is sometimes called the 
absolute value of a; more often, however, the number r is called 
the modulus of the number a and is denoted by | a |. 

The angle 9 is called the argument of the number a and is denoted 
by arg a [we thus dispense with the customary names of the polar 
coordinates of a point: the radius vector and the polar (or vectorial) 
angle]. The angle ọ can take on any real values (positive or nega- 
tive), the positive angles being reckoned counterclockwise. But 
if the angles differ by 2x or a multiple of 2n, then the points they 
depict in the plane will be coincident. 


a EITAN 


114 CH. 4. COMPLEX NUMBERS 


Thus, the argument of a complex number «@: has an infinity 
of values differing by integral multiples of the number 2x; from 
the equality of two complex numbers specified by their moduli 
and arguments one can only conclude, consequently, that the argu- 
ments differ by an integral multiple of 2x, whereas the moduli are 
the same. It is only for the number 0 that the argument is not defined. 
However, this number is fully determined by the equation | 0 | = 0. 

The argument of a complex number is a natural generalization 
of the sign of a real number. The argument of a positive real number 
is zero, the argument of a negative real number is n. There are 
only two directions out of the origin on the axis of reals and they 
may be distinguished by two symbols: +- and —. Now in the complex 
plane, there are infinitely many directions issuing from the point 0, 
and they differ in the angle formed with the positive direction of 
the real axis. 

The Cartesian and polar coordinates of a point are connected 
by the following relation which holds true for any position of points 
in the plane: 


a=rcosg, b=rsing . (1) 
Whence 
r= 4+ Va +b? (2) 


Let us apply formulas (1) to an arbitrary complex number 
œa =a + bi: 
œa = a + bi =r cos ọ + (r sin ọ) i 


or 
a = r (cos ọ + isin ọ) (3) 


Conversely, let the number œ = a + bi admit a notation of the 
form œ = ro (Cos Po + i sinẸ o), where rọ and qo are certain real 
numbers and rọ > 0. Then ry cos®o = a, ro sin Pg = b, whence 
ro=+V a+b, that is, by (2), ro=|a |. Whence, using (1), we 
get COS o = COS q, Sin Mo = sin ọ, that is po = arg a. Thus, any complex 
number œ is uniquely defined by (3), where r= |a |, ọ = arga 
(the argument @ being of course defined only to within multiples 
of 27). This notation of the number «æ is called the trigonometric 
form and will be used very often in the sequel. 

The numbers : 


T setae Ge 19 :.» 19 
a=3 (cos 7+isin 2), p=cos->n-+isin> n 


and 


y=V3 [ cos (—+) +isin (-7)] 
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are given in trigonometric form; here |a|=3, [B/=1, |yl|=V 3; 
n 19 qt n 13 
arg =, agp= zn, argy= y (or arg B=, arg y= n) é 
On the other hand, the complex numbers 


a’ ==: (— 2) (cos = +isin 2) , P =3 (cos $ n— ising E 


; F : ; 3 
y =2(cos $ +isin$ n), ô =sin m -+icos >a 
are not given in trigonometric form, although their notations resem- 
ble that of (3). In trigonometric form, these numbers look like 


a’ = 2 (cos $ a-+isine x), pP =3 (cos Fn+ising x), 


ere zach 

6 = cos 7 N+ ising i 
Finding the trigonometric form of a number y’ involves difficulties 
that are almost always encountered when passing from the customary 
notation of a complex number to its trigonometric notation and 
vice versa: with the exception of a few cases, it is impossible to 
find the exact angle on the basis of given numerical values of the 
sine and cosine, and it is impossible for a given angle to write the 
exact values of its sine and cosine. 

Let the complex numbers a and B be given in trigonometric 
form: ao =r (cosm-+ising), B =r (cos®’ + ising’). Multi- 
plying these numbers together, we get 
aß = [r (cos ọ + isin @)]-[r’ (cos œ + isin q’)] 

= rr’ (cos 9 cos p +i cos p sin p° + i sin @ cos g’—sin gsing’) 
or 

aß = rr’ [cos (p + p) + i sin (p + p°) (4) 


We have the product af written in trigonometric form and so 


laß | =rr’ or | 
lap [= la]i] (5) 


In words, the modulus of a product of complex numbers is equal to the 
product of the moduli of the factors. Also, arg (aß) = pọ + q’ or 


arg (aĝ) = arg a + arg fp (6) 


The argument of a product of complex numbers is equal to the sum 
of the arguments of the factors (note that equality here means to within 
a multiple of 2m). These rules obviously carry over to any finite 
number of factors. As applied to real numbers, formula (5) yields 
the familiar property of absolute values of the numbers, and 
(6), as can readily be verified, turns into the rule of signs in the 
multiplication of real numbers. 


8* 
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Analogous rules are valid in the case of a quotient. Indeed, let 
œ =r (cos p+ising), B =r’ (cos g + isin gq’); B = 0; that is 
r’=0. Then 


a r(cosp-+ising) _ r(cos@-+ising) (cos p’—isin p’) 

B r’ (cosg’+ising’) r’ (cos? g’ + sin2 g’) 

r Pn iru ae $ . j ; ee 
= (cos pcos p + isin ọ cos g’ —icos ọsin gp’ + Sin ọsin g’) 
or ; | 
a r f r PET 7 r 
p =r [osp p) +isia(p— p) (7) 
Whence it follows that E |= . or 

2 (ed 
[F= (8) 

The modulus of a quotient of two complex numbers is equal to the modu- 
lus of the dividend divided by the modulus of the divisor. Also, arg (+) = 
= p — p or 
arg ($) = arg a —arg ĝ 9 


The argument of a quotient of two complex numbers is obtained by 
subtracting the argument of the divisor from the argument of the dividend. 


ap Im 


| Fig. 5 . Fig. 6 


It is not difficult now to grasp the geometric meaning of multi- 
plication and division. Because of (5) and (6), we get a point depicting 
the product of the number a by the number B =r’ (cos g’ + isin 9’) 
if the vector from 0 to a (Fig. 5) is rotated counterclockwise through 
an angle gm’ = argB and then stretched by a factor r’ = |P | (for 
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0O<r’ <1 it will be a compression instead of a dilation). Also, 
. from (7) it follows that for a =r (cos ọ + i sin g)=£0 we have 


at = r~ [cos (—g) + isin (—Q)] (10) 


i.e., |a@? | = | a |}, arg (a7?) = —arg a. We thus obtain point a~! 
if from point æ we go to point @’ at a distance r~t from zero on the 
same half-line emanating from zero as is point a (Fig. 6),* and then 
_ go to a point symmetric to a’ about the real axis. 

A sum and difference of complex numbers given in trigonometric 
form cannot be expressed by formulas similar to (4) and (7). However, 
for the modulus of a sum we have the following important inequa- 
lities: 


lal—IBI<lao+tPi<lal+1B! (11) 


In words, the modulus of a sum of two complex numbers is less than 
or equal to the sum of the moduli of the terms but greater than or equal 
to the difference of these moduli. Inequalities (11) follow from the 
familiar theorem of elementary geometry concerning the sides of 
a triangle because | a -+ P | is, as we know, equal to the diagonal 
of a parallelogram with sides | a | and |P |. Incidentally, the case 
for points œ, Ê and O lying on one straight line requires a special 
investigation, which we leave to the reader. It is only in this case 
that the equalities are attained in formulas (11). 

- From (41), because a — B = a + (—f) and 


ee oe eee | (12) 


(this equation follows at the very least from the geometric inter- 
pretation of the number —f), also follow the inequalities 


lal—|BI]l<le—Bp]<la{[+ 1B] (13) 


That is, the same inequalities hold for the modulus of a difference 
as for the modulus of a sum. 

Inequalities (11) might be obtained in the following manner. 
Let a =r (cos ọ + isin ọọ), B =r’ (cos?’ + ising’) and let 
the trigonometric form of the number a+f6 be a+ ß= 
= R (cosp + i sin p). Adding the real and imaginary parts separa- 
tely, we obtain 

r cos ọ + r’ cos g = R cos p, 


rsin ọ +r’ sing’ = Rsiny 


* |a’| = | a | if and only if |a| = 1, that is, if the point œ lies on the 
circumference of the unit circle. If a lies inside the unit circle, then a’ will be 
outside it, and vice versa. In this way we obviously obtain a one-to-one cor- 
respondence between all points of the complex plane outside the unit circle 
and all nonzero points within the unit circle. 
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Multiplying both sides of the first equation by cos p and both sides 
of the second by sin p and then adding, we get 


r(cos ọ cos p + sin ọ sin p) + 7’ (cos g’ cos + sin gq’ sin Yp) 
= R (cos? p + sin? 1p) 
That is, 
r cos (p — p) + r’ cos (g — >) = R 


Whence, since the cosine is never greater than unity, follows the 
inequality r + r' > R, or |a |+ |B| >|a«a + 6 |. On the other 
hand, a = (a + P) — B = (a + B) + (—f), whence, by what has 
been proved and by virtue of (42), 


lal<la+Bpl]+]—-Bl=l]a+B{/+1B! 
From this, |a|— |B[<|a+f |. 


It is well to note that for complex numbers the concepts of 
“more than” and “less than” cannot be reasonably defined because 
these numbers, in contrast to the real numbers, are not located 
ona straight line, whose points are naturally ordered, but in a plane. 
For this reason, complex numbers as such (not 
their moduli) can never be connected by an 
inequality sign. 

Conjugate numbers. Suppose we have 
a complex number a = a + bi. The number 
a — bi, which differs from a solely in the 
sign in front of the imaginary part, is called 
the conjugate of a and is denoted by a. 

It will be recalled that when considering 
the division of complex numbers we resorted 
to conjugate numbers but did not introduce 

Fig. 7 that term. _ 
The conjugate number of a is obviously 
a; in other words, we can speak of a pair of 
conjugate numbers. The real numbers are the only numbers which 
are conjugate to themselves. 

Geometrically, conjugate numbers are points symmetric about 

the real axis (Fig. 7). Whence follow the equations 


la l= la |, arg a = —arg a (14) 
The sum and product of conjugate complex numbers are real numbers. 


Indeed, 
a +a = 2a, \ (15) 


aa = a + b? = |a}? 
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The last equation shows that the number aa is positive even 
for a = 0. In Sec. 24 we will derive a theorem which shows that 
the property proved here is characteristic of conjugate numbers. 

The equation 


(a — bi) + (c — di) = (a +) — (b+ di 


shows that the conjugate of a sum of two numbers is equal to the sum 
of the conjugates of the numbers: 


aF =a (16) 
Similarly, from the equation 
(a — bi) (c — di) = (ac — bd) — (ad + be) i 


it follows that the conjugate of a product is aes to the product of the 
conjugates of the factors: 


ap = a-B (17) 
Direct verification also shows the following formulas to be valid: 
a—Bp=a—B, (18) 

)-F “9 

We will now prove the following assertion: if a number a is in 
some way expressed in terms of the complex numbersB,, Be, -- +> Bn 


by means of addition, multiplication, subtraction and division, then 
by replacing all the numbers ÑB, in this expression by their conjugates, 
we obtain the conjugate of a; in particular, if œ is a real number, it 
does not change when all the complex numbers p, are replaced by 
their conjugates. 

We shall prove this assertion by means of induction with respect 
to n, since for n = 2 it follows from formulas (16)-(49). 

Let the number a be expressed by the numbers Bi, Bo, >. Ba 
not necessarily distinct. This expression gives a definite order in 
which the operations of addition, multiplication, subtraction and 
division are applied. The last step will be to apply one of these opera- 
tions to the number y; expressed in terms of the numbers B4, Bo, .. . 
...; Ba, where 1<k <n — 1, and to the number y, expressed 
in terms of the numbers Bk+1, . .., Bn. By the induction hypothesis, 
replacement of the numbers ĝi, Bo, ..., Ba by their conjugates 
implies a replacement of the number y, by the number yı, and 
a replacement of the numbers 8,41, Bate, .--, Pn by their conju- 
gates implies substitution of y, by Yə However, by one of the for- 
mulas (16)-(19), the transition from y, and y, to yı and Ya converts 
the number a to a. 
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19. Taking Roots of Complex Numbers 


Let us now examine the raising of complex numbers to a power 
and the taking of roots. To raise a number a = a + bi to a positive 
integral power n, it suffices to apply Newton’s binomial theorem 
to the expression (a + bi)” (this formula holds true for complex 
numbers as well, since its proof is based solely on the distributive 


law) and then take advantage of the equations i? = —1, i = —i, 
it = 1, whence, generally, 
ak 4, iH — i, jak+2 —4, jahts a —i 


If a number a is given in trigonometric form, ‘then for a positive 
integral n, there follows from (4) of Sec. 18 the tonowe formula 
called De Moivre’s formula: 

[r (cos ọ + isin p)” =r” (oer ng + isin nq) (1) 
In raising a complex number to a power, raise the modulus to that power 
and multiply the argument by the exponent. Formula (1) holds true 


for negative integral exponents as well. Indeed, since a-" = (a71)", 


it is sufficient to apply the De Moivre formula to the number a~}, 
the trigonometric form of which is given by (10), Sec. 18. 


Examples. 
(1) 87 = i, P= A. E 
(2) (2 + 5i)? = 23 + 3-22-57 + 3-2-5272 + 53:3 
= 8 + 60i — 150 — 125i = —142 — 65i. 


(3) [v3 (cos F +i sin Z) | = (V2)! (eos a+isina)= —4. 


[3 (cos 5 5 tisin z)” 


E 3 58 3 _4 ee ad as 
== 3-3 [ cos (—+ x) +isin (-5 n) |= (cos 5 t+isin s x) 3 
A special case of De Moivre’s formula, namely, the equation 
(cos ọ + isin ọpọ)” = cos nọ + isin no 


permits finding with ease formulas for the sine and cosine of a mul- 
tiple angle. Indeed, expanding the left member of this equation 
by the binomial formula and equating the real and imaginary parts 
of both sides separately, we obtain 


cos nọ = cos” p — ( A ) cos” @-sin? p+ ( j ) cos"~* p-sin* p — wey 
sin nọ = ( ‘i ) cos"! q. sin ọ— ( 3 ) cos”? orsin’ ọ 


+ ( z) cos™™? msinFo—... 
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‘Here, (7) is the usual notation for a binomial coefficient: 
(7 n(n—1) (n—2)...(n—k+1) 
T rs ro amma 
For n = 2 we arrive at the familiar formulas 
cos 2p = cos? ọ — sin? ọ, 
sin 29 = 2 cos ọ sin ọ 
and for n = 3 we obtain the formulas 
cos 39 = cos? ọ — 3 cos ọ sin? g, 
sin 39 = 3 cos? g sin ọ — sin? ọ 
Extracting roots of complex numbers is a far more difficult task. 
Let us start with the square root of the number a = a + bi. As yet 
we do not know whether there exists a complex number whose 


square is equal to œ. Let us assume that such. a number u + vi 
exists; that is, using conventional symbols, we can write 


Va+bi =utvi 
From the equation 
. - (u + vi)? =a + bi 
it: follows that 


(2) 


Squaring both sides of each of the equations of (2) and then adding, 
we get 


u? — vV = a, 
2uv = b 


(u? ae v?)? +4 2p? = (u? -+ v?)? = q? -+ b? 
whence 
u? -+o = Va Ho 
The plus sign is taken because the numbers u and v are real and 


therefore the left member of the equation is positive. From this 
equation and from the first of the equations of (2), we get 


u= (a+ V EF ®), 
vias (a +V AF) 


Thus, extracting the square roots we get two values for u which 
differ in sign and also two values for v. All these values will be real 
since the square roots are extracted from positive numbers for any 
aand b. The values obtained for u and v cannot be combined in arbi- 
_trary fashion, since, by the second equation of (2), the sign of the 
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product uv must coincide with the sign of b. This yields two possible 
combinations of values of u and v, that is, two numbers of the form 
u + vi which can serve as values of the square root of the number a; 
these numbers differ in sign. An elementary though unwieldy check 
(squaring the resulting numbers separately for the case b > 0 and 
b < 0) shows that the numbers we found are indeed the values of the 
square root of the number a. Thus, taking the square root of a com- 
plex number is always possible and yields two values which differ in sign. 

In particular, it now becomes possible to extract the square root 
of a negative real number; the values of this root will be pure ima- 
ginaries. Indeed, if a < 0 and b = 0, then V a? + b? = —a, since 
this root must be positive, but then u? = + (a — a) = 0, that 
is, u = 0, whence Va = +vi. 

Example. Let « = 21 — 20i. Then Va? -F b?= 1/441 + 400 = 29. There- 
fore, u? = > (24 + 29) = 25, v = $a + 29) = 4, whence u = + 5, 
v = + 2. The signs of u and v must be different since b is negative, therefore 


Vu — ii = +(5 — 2i) 


Attempts to extract higher (than second) roots of complex num- 
bers given in the form a + bi encounter insuperable difficulties. 
Thus, if we wished to extract the cube root of a number a + bi, 
we would first have to solve some auxiliary cubic equation, which 
we are as yet unable to do, and which in turn would require, as we 
shall see in Sec. 38, the extraction of the cube root of a complex 
number. On the other hand, the trigonometric form is extremely well 
suited to extracting roots of any degree. Using the trigonometric 
form we will now exhaust this problem completely. 

Let it be required to extract the nth root of a number a = 
= r (cos g + isin ọpọ). Let us assume that this is possible and that 
we get the number p (cos @ + i sin 0), that is 


[p (cos 8 + i sin 6)]” = r (cos ọ + i sin g) (3) 


Then, by De Moivre’s formula, pọ” = r, that is p = v r, where 
the right member contains a uniquely determined positive value 
of the nth root of the positive real number r. On the other hand, the 
argument of the left member of (3) is n0. We cannot assert, however, 
that 78 is equal to g, since these angles may actually differ by some 
integral multiple of 2x. Therefore, n0 = @ + 2kn, where k is an 
integer, whence 

0 = p+2kn 
n 


Conversely, if we take the number j” r (cos? aeai PE g 2er 
then for any integral k, positive or negative, the nth power of 


19. TAKING ROOTS OF COMPLEX NUMBERS 423 


this number is equal to a. Thus 


7 (cos p+ isin @) =% (cos ret + isin a (4) 


Assigning different values to k, we will not always get distinct 

values of the required root. Indeed, for 
k=0,1,2,...,.n—41 (5) 

we get n values of the root, all distinct, since increasing k by unity 
implies increasing the argument by =, Now let k be arbitrary. 
Ifk=ngtr, O<r<n—t, then 

staro Train a peek 4 2qn 
In other words, the value of the argument for our k differs from the 
value of the argument for k = r by a multiple of 2x. We thus obtain 
the same value of the root as for the value of k equal to r, that is, 
such as lies in the set (5). 

Thus, extracting the nth root of a complex number a is always 
possible and yields n distinct values. All values of the nth root lie on 
a circle of radius y/ |a | with centre at zero and divide the circle into 
n equal parts. 

In particular, the nth root of a real number a also has n distinct 
values, of which two, one, or none will be real, depending on the 
sign of a and the parity of n. 

Examples. 


(A p= // 2( ee Genel 
) = cos 7A in Z )=V 
8/5 N A 
k =Q: BPo=y 2 (cos +isin g) ; 
zae maea (eee nirs a 
k=1: Br=y 2 (cos 7p tisin Fs z); 


k=2; Poe V2 cos att tings x) i 


. 


hd 


Sn 2kn Ë np2kn 
3\ cos +—— +i sin——3—— 


(2) p= Vi =// cos © 4 isin £003 2 — +isin +; 


T, n Ve V2 E 5 ee: 
Po=cos 7 +i sın I~z 53 a ne ees 
(3) ee ee (cos PORK A a sin ztn), 


Po=2 (cos $+: sin >) S575 
By = 2 (cos nn nm) = —2; 


B, = 2 (cos = OF isin =) =1—i V3. 
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Roots of unity. Of particular importance is the case of extracting 
the nth root of unity. This root has n values, and, because of the 
equation 1 = cos 0 + i sin 0, and formula (4), all these values or, 
as we shall say, all the nth roots of unity, are given by the formula 


VT =cos “+ isin 2an k=0,1,...,n—1 (6) 
The real values of the mth root of unity are obtained from formula (6) 
for the values k = 0, and ~ E if n is even, and for k = 0 if n is odd. 


In the complex plane, the nth roots of unity are located on the cir- 
cumference of the unit circle and divide it into n equal arcs: one 
of the division points is the number 1. From this it follows that 
those of the nth roots of unity which are not real are situated sym- 
metrically about the real axis (that is, are pairwise conjugate). 

The square root of unity has two values: 1 and —14; the 
fourth root of unity has four values: 1, —1, i and —i. It is 


advisable for what follows to memorize the values of the cube 


root of unity. By (6), the roots are cos i + isin = , where k = 


= 0, 1, 2; that is, besides unity, the conjugate numbers 


& = cos -y + isin = = eee ey 
Z (7) 
£ = COS Tt isin = ewe A 


as well. 

All values of the nth root of a complex number a may be obtained 
by multiplying one of these values by all the nth roots of unity. Indeed, 
let B be one of the values of the nth root of the number aq, i.e., 
Bp” = @ and let s be an arbitrary value of the nth root of unity, that 
is, e” = 1. Then (Be)” = B”e” = a. Thus fe is also one of the 
values for ,/ a. Multiplying B by each of the nth roots of unity, we 
get n distinct values of the nth root of the number a, that is, all 
the values of this root. 


Example 1. One of the values of the cube root of —8 is —2. The two others 


aes by (7), the numbers —2e,=1 —i 13 and —2e,=1+-i 3 (see Example 3 
above). 
Example 2. j/81 has four values: 3, —3, 3i, —3i. 


The product of two nth roots of unity is itself an nth root of unity. 
Indeed, if e" = 1 and yn” = 1, then (en)" = ey” = 4. Also, the 
reciprocal of an nth root of unity is itself that root. Let e” = 1. Then 
from e-e7! = 1 it follows that e”-(e-)" = 1, that is, (e7 = i. 
Generally, any power of the nth root of unity is also an nth root of 
unity. 
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Any kth root of unity will also be an lth root of unity for any 1 
that is a multiple of k. Whence it follows that if we regard the entire 
collection of nth roots of unity, then some of these roots will already 
be n’-th roots of unity for some n’ which are divisors of the number n. 
However, for any n, there exist mth roots of unity such that they 
are not any lesser roots of unity. These roots are termed primitive 
nth roots of unity. Their existence follows from formula (6): if the 
value of a root corresponding to a given value of k is denoted by €p 
(so that g = 1), then on the basis of De Moivre’s formula (1), 

ek = ep 
Thus, no power of g, less than the nth will be equal to 4, that is 
2n eae fat 
&; = cos — + isin — is a primitive root. 

An nth Poot & of unity is a primitive nth root if and only if its powers 
e", k= 0, 1,..., m—4, are distinct, that is, if they exhaust all 
the nth roots of unity. 

Indeed, if all the indicated powers of the number e are distinct, 
then e is obviously an mth primitive root. But if, for example, e? = 
=e for0 Lk <l <n — 1, then et? = 1; that is, because of the 
inequalities ` 1 < l— k< n — 1, the root e will not be primitive. 

The number » & found 1 above is not, in the general case, the only 
primitive nth root. The following theorem is used to find all of these 
roots. 

If e is a primitive nth root of unity, then the number e" is a pri- 
mitive nth root if and only if k is relatively prime to n. 


Let d be the largest common divisor of the numbers k and n. 
If d > 1 and k = dk’, n = dn’, then 


(er) ne gkn’ = gh’n == (er) = 4 
that is, the root e? is an n'-th root of unity. 


On the other hand, let d = 1 and at the same time let the number 
e? be an mth root of unity, 4< m < n. Thus, 


(e — gm = 4 


Since the number e is a primitive nth root of unity, that is, only its 
powers with exponents that are multiples of n can be equal to unity, 
it follows that the number km is a multiple of n. But since i < m < 
< n, the numbers k and n cannot be relatively prime; this con- 
tradicts the assumption. 

Thus, the number of primitive nth roots of unity is equal to the 
number of positive integers k less than n and relatively prime to n. 
The expression for this number, which is ordinarily denoted by 
ọ (7), may be found in any course of number theory. 

If p is a prime number, then all these roots except unity itself 
will be primitive pth roots of unity. On the other hand, i and —i 
(not 1 and —1) will be among the primitive fourth roots of unity. 


CHAPTER 5 


POLYNOMIALS 
AND THEIR ROOTS 


20. Operations on Polynomials 


The content of the first two chapters of this book—the theory 
of determinants and the theory of systems of linear equations— 
grew out of the elementary school course of algebra which proceeds 
from one equation of the first degree in one. unknown to systems of 
two and three equations of the first degree in two and three unknowns 
respectively. The second branch of elementary algebra, which in that 
setting appeared to be the more important one, consisted in passing 
from first-degree equations in one unknown to an arbitrary quadratic 
equation again in one unknown, and on to certain special types 
of equations of the third and fourth degree. This trend is further 
developed into a very extensive and rich branch of higher algebra 
devoted to the study of arbitrary equations of the nth degree in one 
unknown. This division of algebra, which is historically the earlier 
one, is treated in the present chapter and in some of the later chap- 
ters of this text. 

The general form of an nth-degree equation (n a positive inte- 
ger) is 


at” + art -A.nna + anat +a, = 0 (1) 


The coefficients do, a, ..., Qn4, An of this equation will be 
considered to be arbitrary complex numbers and the leading coef- 
ficient a) must be nonzero. 

If an equation like (4) is written, it is assumed that we have to 
solve it. In other words, we have to find numerical values for the 
unknown z that satisfy the equation, that is, values, which, when 
substituted in place of the unknown and after all indicated opera- 
tions have been carried out, reduce the left member of (1) to zero. 

However, it is advisable to replace the problem of solving equa- 
tion (1) by the more general one of studying the left member of this 
equation: 


aoz? + agz™ 2+... + anar + a, (2) 
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_ which is called a polynomial of degree n in the unknown x. Remember 
that only expressions like (2) are polynomials, that is, only the sum 
of integral nonnegative powers of the unknown z taken with certain 
numerical coefficients, and not just any sum of monomials, as was 
the case in elementary algebra. In particular, we will not consider 
as polynomials expressions which contain negative or fractional 


powers of the unknown z, such as 22? -o +3 or ax + bz? + 


1 
+ cx + d + ez 4+- fz? or z?:+ 1. For brevity, we will denote 
polynomials by the symbols f (xz), g (x), @ (z), and so on. 

Two polynomials f (xz) and g (x) will be considered equal (or 
identically equal), f (x) = g (x), only when the coefficients of like 
powers of the unknown are equal. To be specific, no polynomial can 
be equal to zero if at least one coefficient is nonzero and for this 
reason, the equality sign used in the notation (1) of an nth-degree 
equation has no connection with the above-defined equality of poly- 
nomials. The = sign connecting polynomials will always be under- 
stood in the sense of an identical equality of these polynomials. 

Thus, we look upon the nth-degree polynomial (2) as a certain 
formal expression, fully defined by the set of its coefficients apo, 
Qi, ...+, An, Where ay = 0. The exact meaning of these words will 
be explained in Chapter 10. Note that aside from the notation of 
a polynomial given in (2) (in descending powers of the unknown 2), 
we may use other notations obtainable from (2) by a rearrangement 
of the terms, say, in ascending powers of the unknown. 

There is of course the possibility of regarding the polynomial 
(2) from the viewpoint of mathematical analysis and of considering 
it to be a complex function of a complex variable z. However, we 
have to bear in mind that two functions are considered equal if 
their values for all values of the variable x are equal. It is clear 
that two polynomials which are equal in the above-mentioned formal 
algebraic sense will also be equal as functions of z. The converse 
will be proved only in Sec. 24 however. After that the algebraic 
and function-theoretic viewpoints on the concept of a polynomial 
with numerical coefficients will indeed be equivalent. For the time 
being, however, each time we have to indicate precisely which sense 
is meant. In the present section and the two following sections we 
will look upon the polynomial as a formal-algebraic expression. 

Naturally, there are mth-degree polynomials for any natural 
number n. We consider all possible polynomials of this kind: frst- 
degree (or linear), quadratic, cubic, etc. We will also encounter 
polynomials of degree zero, which are nonzero complex numbers. The 
number zero will also be taken to be a polynomial. This is the 
only polynomial whose degree is not defined. 

For polynomials with complex coefficients we now define the ope- 
rations of addition and multiplication. These operations will be 
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introduced using the pattern of operations involving polynomials 
with real coefficients, which are familiar from the course of ele- 
mentary algebra. . 
If we are given polynomials f (x) and g (x) with complex coef- 
ficients (written, for convenience, in ascending powers of z): 


f (2) = ay + ayaa ti... + Gyre" + a2", a, £0, 
g(t) = bi tbat... + bs? + bat, b, £0 
and. if, for example, n > s, then their sum is the polynomial 
f (2) + g (x) = co + oye bw F enat + ena” 


whose coefficients are obtained by adding the coefficients. of the 
polynomials f (x) and g (x) of like powers of the unknown, i.e., 


Ci =a; + bi i= 0, err N (3) 


For n > s, the coefficients b,4;, baos ...+, bn are to be taken equal 
to zero. The degree of the sum will be equal to n if n is greater than 
s, but for n = s it may accidentally prove less than n, namely, 
- when b, = —a. 


The " product of polynomials f (x) and g (x) is the polynomial 
f (x)-g (2) = do + dix +... + dna"! + dny en 


whose. coefficients are determined as follows: 


= » hbi, i=0,1,...,n+s—1, n-+s (4) 

k+l==i 
That is, the coefficient d; is the result of multiplying those coeffi- 
cients of the polynomials f (x) and g (xz) whose sum of indices is 
equal to i and of adding all such products; in particular, dọ = 
= aobo, da = dobi + aibo, ..., dn+s = Anbe From the latter 
equality follows the inequality d,,,=4 0 and therefore the degree 
of the product of two polynomials is equal to the sum of the degrees 
of these polynomials. 

From this it follows that the product of polynomials different 
from zero can never be equal to zero. 

What properties do these operations that we have introduced 
for polynomials have? The commutative and associative laws for 
addition follow immediately from the validity of these properties 
for addition of numbers, since we add the coefficients of each power 
of the unknown separately. Subtraction is possible: the role of zero 
is played by the number zero, which we have included in the class 
of polynomials, and the opposite of f (x) will be the polynomial 


—f (£) = —a) — x —... — äna"! — a,x” 


The commutative law for multiplication follows from the com- 
mutativity of multiplication of numbers and from the fact that 
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in the definition of a product of polynomials, the coefficients of 
both factors f (z) and g (z) are of an equal status. The associativity 
of multiplication is proved as follows: if besides the above-written 
polynomials f (x) and g (x), we are given the polynomial 


h (z) = Co + az tt... F erar! teat, cf «0 


then the coefficient of zt, i = 0, 14, ..., n + s + t, in the product 
[f (x) g (x)] h (x) is the number . 


( > , arbi) emae 2 apbicm 


j+m=i k+l= +l4+m=i 
and in the product f(z) ce i the equivalent number 


» ax ( 2. we m) = ae andiem 
hfj=i +ifm=i 

Finally, the validity of ie distributive law follows from the 
equation 


> (ant bh)e= X arert Ji brei 
k4-l=i : k+l=i k 


since the left-hand member of this equation is the coefficient of z* 
in the polynomial [f (z) + g (x)] k (x) and the right-hand member 
is the coefficient of the same power of the unknown in the poly- 
nomial f (z) k (z) + g (£) k (2). 

It will be noted in the multiplication of polynomials that the 
role of unity is played by 1, which is regarded as a polynomial of 
degree zero. On the other hand, a polynomial f (x) has an inverse 


JS (zx), 
f(z) f> (z) = 1 (5) 


if and only if f (x) is a polynomial of degree zero. Indeed, if f (x) is 
a nonzero number a, then the inverse polynomial is the number a7}. 
But if f (x) has degree n > 1, then the degree of the left side of (5) 
would not be less than n if the polynomial f~! (x) existed, whereas 
the polynomial on the right is a polynomial of degree zero. . 

Consequently, the multiplication of polynomials has no inverse 
operation (division). In this respect, the set of all polynomials with 
complex coefficients resembles the set of all integers. The analogy 
may be continued in that polynomials, like the integers, have 
a division algorithm (with remainder). Elementary algebra describes 
this algorithm for the case of polynomials with real coefficients. 
However, since we are dealing with polynomials with complex 
coefficients, it is well to review once again all the statements and 
to carry out the proofs. 

For any two polynomials f (x) and g (x) we can find polynomials 
q (x) and r (x) such that 


f (x) = g (2) q (£) + r (2) (6) 
9—5760 
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the degree of r (x) being less than the degree of g(x), or r (x) = Q. 
The polynomials q (x) and r (x) satisfying this condition are defined 
uniquely. 

Let us first prove the latter half of the theorem. Let there also 


be polynomials g (x) and r (z) such that likewise satisfy the equation 

f@=e@q@+r) | | — 
the degree of r (x) again being less than the degree of g (z)*. Equa- 
ting the right sides of (6) and (7), we obtain 

g (x) Ig (z) — q (x)] = r (z) — r (2) 
The degree of the right side of this equation is less than the degree 
of g (x), but the degree of the left side would be greater than or equal 
to the degree of g (z) for q (z) —q (x) = 0. Therefore, it must be 
true that q (x) — q (x) = 0, that is, q (x) = q (x), but then r (x) = 
= r (zx), which is what we set out to prove. 
We now prove the first part of the theorem. Let the polynomials 
f (x) and g (zx) have degrees n and s, respectively. If n < s, then we 
can putq (x) = 0,r (x) = f (x). But if n > s, then we take advantage 
of the same method by which in elementary algebra we divide 
polynomials with real coefficients (in descending powers of the 
unknown). Suppose 
f (£) = aoz” + ayz™? +... + anat t an, ao 0, 
g (x) = boz? + b,x} H... + baz + Ds, bo ~0 

Setting 

f(z) — a" g(a) = fi (2) (8) 


we get a polynomial whose degree is less than n. Denote this degree 
by n, and the leading coefficient of the polynomial f; (xz) by ajo. 
Now, if we still have n, œ s, set 


file) Flame (2) = fale) (81) 
Denoting by n, the degree and by a, the leading coefficient of the 
polynomial f, (x), we set | 

fa (2) — oars" g (2) = fa (2) (82) 


and so forth. 
Since the degrees of the polynomials f; (x), fa (x£), ... decrease, 
n> ny > nœ.. ., we finally arrive (after a finite number of steps) 


at the polynomial f, (z), 
fans (2) — AEB? g"r-1—*g (x) = fr (2) (8n-1) 


* Or r(x) = 0. This case will not be specifically stated in the sequel. 
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the degree of which, n,, is less than s. Our procedure has come to 
a halt. Now adding (8), (81), ..-, (8,-1), we get 
| — [20 yn-s 4 10 pni-s Gh-1,0 yp 4-8 = 

f (a) — (fears pame... H eee gaa) g (2) = fy (2) 


Thus, the polynomials 


— 20. pn-s 1 740 ni-s 2k-1,0 Np 4-8 
q (x)= ae gee Tae ee g”? 
o r(x) = fre (2) 


do indeed satisfy (6), and the degree of r (x) is in fact less than the 
degree of g (z). 

Note that the polynomial q (x) is called the quotient obtained 
from the division of f (x) by g (x), and r (x) is the remainder. 

From this consideration of the division algorithm, it is easy 
to establish that if f (x) and g (x) are polynomials with real coefficients, 
then the coefficients of all polynomials f, (x), fa (x), . . . and therefore 
also the coefficients of the quotient q (x) and the remainder r (x) will 
be real, 
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Suppose we have nonzero polynomials f (x) and ọ (z) with com- 
plex coefficients. If the remainder after dividing f (z) by ọ (x) is 
zero, we then say that f (x) is divisible (exactly divisible) by ọ (x). 
Here, the polynomial ọ (z) is called the divisor of the polynomial 
Í (x). 
! The polynomial ọ (x) is a divisor of the polynomial f (x) if and 
only if there exists a polynomial p (x) such that satisfies the equation 


f (x) = @ (x) 4 (2) (1) 


Indeed, if ọ (z) is a divisor of f (z), then for p (x) we should take 
the quotient of f (x) divided by ọ (x). Conversely, let there be a poly- 
nomial p (z) which satisfies (1). From the proof given in the pre- 
ceding section on the uniqueness of the polynomials q (x) and r (z) 
which satisfy the equation 


(2) = ọ (x) g (£) +r (2) 


and the condition that the degree of r (x) be less than the degree 
of ọ (x), it follows in our case that the quotient of f (x) by ọ (z) 
is equal to p (x), and the remainder is zero. 

Naturally, if equation (1) holds, then p (x) is also a divisor 
of f (x). Furthermore, it is obvious that the degree of ọ (x) does not 
exceed the degree of f (x). 

Note that if the polynomial f (x) and its divisor ọ (x) both have 
rational or real coefficients, then the polynomial p (z) as well will 


Q* 
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have rational or, respectively, real coefficients since it is sought 
by means of the division algorithm. Of course, a polynomial with 
rational or real coefficients can also have divisors, not ali the coef- 
ficients of which are rational (or real). This is shown for example 


by the equation 
2oti=(rt#—ij(«¢ti 


We indicate a few basic properties of divisibility of polynomials 
that will be very useful later on. 

I. If f (x) is divisible by g (x), and g (x) is divisible by h (x), then 
f (x) is divisible by h (z). 

Since, by hypothesis, f (x) = g (x) ọ (a) and g (z) = h (x) » (2), 
it follows that f (x) = h (x) ly (z) ọ (2)). 

Il. If f (x) and g (x) are divisible by ọ (x), then their sum and 
difference are also divisible by ọ (z). 

Indeed, from the equations f(z) = 9 (x) (z) and g(x) = 
= g (x) x (x) it follows that f (x) + g (xz) = 9 (x) ip (a) + yx (2). 

III. If f (x) is divisible by ọ (x), then the product of f (x) by any 
polynomial g (x) is also divisible by ọ (zx). 

True enough, if f(z) = ọ (zxz) (z), then it follows that 
f(a) g (2) = 9 (2) bp (2) g G). 

From II and ITI we have the following property. 

IV. If each of the polynomials f, (x), fa (x), .- +: fa (x) is divi- 
sible by ọ (x), then the following polynomial will also be divisible 
by 9 (z): 

fa (2) Bs (2) + fa (2) 8a (2) +. - . + fr (2) 8r (2) 


where g, (x), 82 (£), - +--+) Za (x) are arbitrary polynomials. 
V. Any polynomial f (x) is divisible by any polynomial of degree 
zero. 
Indeed, if f (x) = aoz” + az”! +... + a, and c is an arbit- 
rary number not equal to zero, that is, an arbitrary polynomial of 
degree zero, then 


f= (ary barry LHE) 


VI. If f (x) is divisible by `ọ (x), then f (x) is divisible by cq (z) 
as well, where c is an arbitrary number different from zero. 
. From the equation f (x) = ọ (x) (z) follows the equation 
f (x) = leq (z)l- lep (x). RS oo 

VII. The polynomials cf (x), c = 0, and only such polynomials 
are divisors of the polynomial f (x) that have the same degree as f (x). 

Indeed, f (x) = c™ lef (x)], or f (x) is divisible by ef (z). 

If, on the other hand, f (x) is divisible by ọ (z), and the degrees 
of f (x) and ọ (zx) coincide, then the degree of the quotient of f (x) 
by p (2) au be zero, i. e., f (x) = dọ (x), d = 0, whence ọ (z) = 
= d7} (2). 
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. From this we get the following property. 

VIII. The polynomials f (x), g (x) are simultaneously divisible 
one by the other if and only if g (x) = cf (xz), c =Æ 0. 

Finally, from VIII and I we get the property 

IX. Any divisor of one of two polynomials f (x), cf (x), where 
c Æ 0, is a divisor of the other polynomial as well. 

Greatest common divisor. Suppose we have arbitrary polyno- 
mials f (x) and g (x). The polynomial ọ (x) is called the common 
divisor of f (x) and g (x) if it is a divisor of each of them. Property V 
(see above) shows that the common divisors of the polynomials 
f (x) and g (x) include all polynomials of degree zero. If there are 
no other common divisors of these two polynomials, then the poly- 
nomials are called relatively prime. 

But in the general case, the polynomials f (x) and g (x) may have 
divisors which depend on z; we wish to introduce the concept of the 
greatest common divisor of these polynomials. 

It would be inconvenient to take a definition stating that the 
greatest common divisor of the polynomials f (x) and g (x) is their 
common divisor of highest degree. On the one hand, as yet we do 
not know whether f(x) and g(x) have many different common 
divisors of highest degree which differ not only in a zero-degree 
factor. In other words, isn’t this definition too indeterminate? 
On the other hand, the reader will recall from elementary arithme- 
tic the problem of finding the greatest common divisor of integers 
and also that the greatest common divisor 6 of the integers 12 
and 48 is not only the greatest among the common divisors of these 
numbers but is even divisible by any other of their common divi- 
sors; the other common divisors of 12 and 18 are 1, 2, 3, —1, —2, 


3. 


That is why, for polynomials, we have the following definition. 

The greatest common divisor of the nonzero polynomials f (zx) 
and g (x) is a polynomial d (x), which is their common divisor and, 
also, is itself divisible by any other common divisor of these poly- 
nomials. The greatest common divisor of the polynomials f (x) and 
g (x) is symbolized as (f (x), g (z)). 

This definition leaves open the question of whether there exists 
a greatest common divisor of any polynomials f (x) and g (z). We 
will now answer this question in the affirmative. At the same time. 
we will give a practical method for finding the greatest common 
divisor of the given polynomials. Quite naturally, we cannot carry 
over the procedure used for finding the greatest common divisor 
of integers, since we do not as yet have anything analogous in poly- 
nomials to the decomposition of an integer into a product of prime 
factors. However, for integers there is also another method called 
the algorithm of successive division, or Euclid’s algorithm. This pro- 
cedure is quite applicable to polynomials. 
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Euclid’s algorithm for polynomials consists in the following. 
Let there be given the polynomials f (x) and g (x). We divide f (z) 
by g (z) and obtain, generally speaking, a remainder r, (x). Then 
divide g (x) by rı (x) and get a remainder r, (x), divide r, (x) by 
Ta (x) and so on. Since the degrees of the remainders decrease con- 
tinuously, there will come a time in this sequence of divisions when 
the division is exact and the procedure terminates. The remainder 
Tą (x) which divides exactly the preceding remainder rp (x) is the 
greatest common divisor of the polynomials f (x) and g (2). 

By way of proof, let us write the contents of the preceding para- 
graph in the form of a chain of equations: 


f(z) = g (x) qa (2) + rv (2), 
g (x) = r, (z) qa (x) + ra (2), 


ee eg ctyehd Tae (2) 


Tro (x) = Th- (£) qr (z) F rh (x), 
Prt (2) = Tp (2) qra (2) 


The last equation shows that r, (x) is a divisor of rp- (z). 
Whence it follows that both terms of the right member of the second 
last equation are divisible by r, (x) and so r, (x) is also a divisor 
of r-a (x). Rising upwards in this fashion, we find that r, (z) is 
also a divisor of r,-3 (x), ..., Ta (x), 7 (x). Whence, by virtue 
of the second equation, it will follow that r, (x) is a divisor of g (zx) 
and therefore, on the basis of the first equation, a divisor of f (z) 
as well. Thus, rẹ (x) is a common divisor of f (x) and g (z). 

Now let us take an arbitrary common divisor ọ (x) of the poly- 
nomials f (x) and g (x). Since the left side and the first term of the 
right side of the first of the equations (2) are divisible by ọ (z), 
it follows that r, (x) is also divisible by ọ (x). Passing to the second 
and successive equations, we find in the same way that the polyno- 
mials r, (x), r3 (x), ... are divisible by ọ (x). Finally, if it is 
proved that r- (x) and rz, (x) are divisible by ọ (z), then from 
the second last equation we find that r} (z) is divisible by ọ (z). 
Thus, r, (x) is indeed the greatest common divisor of f (x) and g (z). 

We have thus proved that any two polynomials have a greatest 
common divisor, and we have a procedure for computing it. This 
method shows that if the polynomials f (x) and g (x) both have rational 
or real coefficients, then the coefficients of their greatest common divisor 
will also be rational or real, though of course these polynomials 
may also have other divisors, not all coefficients of which are rational 
(real). Thus, the polynomials with rational coefficients 


f (x) = z? — 3z? — 2x + 6, g(x) = z? + z? — 2r — 2 
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have as greatest common divisor the polynomial with rational coef- 
ficients z? — 2, though they have a common divisor z — V 2, not 
all the coefficients of which are rational. 

If d (x) is the greatest common divisor of the polynomials f (z) 
and g (x), then, as Properties VII] and IX (see above) show, for 
the greatest common divisor of these polynomials we could also 
choose the polynomial cd (x), where c is an arbitrary number diffe- 
rent from zero. In other words, the greatest common divisor of two 
polynomials is only determined to within a factor of degree zero. In view 
of this fact we can agree that the leading coefficient of the greatest 
common divisor of two polynomials will always be considered equal 
to unity. Using this condition, we can say that two polynomials are 
relatively prime if and only if their greatest common divisor is unity. 
Indeed, for the greatest common divisor of two relatively prime 
polynomials we can take any number different from zero; but mul- 
tiplying it by the inverse, we get unity. 


Example. Find the greatest common divisor of the polynomials 
f(x) = z4 + 323 — z? — 4r — 3, g (z) = 323 + 10r? + 22 — 3 

Applying Euclid’s algorithm to polynomials with integral coefficients, 
we can (to avoid fractional coefficients) raultiply the dividend or reduce the 
divisor by any nonzero number (this may be done either at the start or at any 
other time in the division). Quite naturally, this will distort the quotient, 
but the remainders that interest us will only acquire some factor of zero degree, 
which as we know is quite permissible when seeking the greatest common divi- 


a We divide f (z) by g (x) but first multiply f (x) by 3: 
| z+i4 
328 +. 1022-4 22— 3 | 3x4-+4- 928 — 3z? — 1272— 9 
324 4+- 1028 + 222 — 3x 
— z3 — §22— 92 —9 
(multiply by —3) 
l 3x? + 415x? + 27x + 27 
823 + 102? + 2z — 3 
5r? + 252 + 30 


Thus, the first remainder, after dividing by 5, will be r; (z) = z? + 5r + 6. 
We divide the polynomial g (z) by it: 


le a 
22-+ 5246 | 328+ 1022-1 22—3 
328 +1522 + 182 
— §22— 1ôr— 3 
—523—25¢— 30 


9z4+ 27 
The second remainder, after dividing by 9, is thus ra (z) = z + 3. Since 


rı (z) = rz (z) (z + 2) 
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it follows that rz (x) will be the last remainder which exactly divides the prece- 
ding remainder. It will consequently be the desired greatest common divisor: 


(f(z), g(z)) = 2+ 3 


We use the Euclidean algorithm to prove the following theorem. 
If d (x) is the greatest common divisor of the polynomials f (z) 
and g (x), then it is possible to find polynomials u (x) and v (x) such that 


f (z) u (x) + g (z) v (z) = d (z) (3) 


If the degrees of the polynomials f (x) and g (x) exceed zero, we can 
then take it that the degree of u (x) is less than the degree of g (x), and 
the degree of v (x) is less than the degree of f (zx). 

The proof rests on the equations (2). If we take into considera- 
tion that r, (x) = d (xz) and if we put u, (z) = 1, v, (x) = —qp (2), 
then the second last of the equations (2) yields 


d (x) = Tp a (£) u, (£) + Tr-1 (£) vi (2) 


Substituting the expression r- (z) in terms of r-z (x) and rz» (x) 
from the preceding equation (2), we get 

d (x) = Th-3 (£) Ua (x) + Tr-2 (2) Va (2) 
where, obviously, we (x) = v, (z), Va (x) = u, (x) — vı (2) qr-1 (2). 
Continuing upwards through the equations of (2), we finally arrive 
at the equation (3) being proved. 

To prove the second assertion of the theorem, assume that the 
polynomials u (x) and v (x) which satisfy (3) have already been 
found, but that, say, the degree of u (x) is greater than or equal to 
the degree of g (x). Divide u (z) by g (z): 

u (x) = g (z) q (x) +r (2) 


where the degree ofr (x) is less than the degree of g (x), and substitute 
this expression into (3). We get the equation 


f (x) r (z) + g (x) lv (z) + f (z) g (2)] = d (2) 


The degree of the factor of f (x) is now less than the degree of g (z). 
The degree of the polynomial in square brackets will in turn be 
less than the degree of f (x), since otherwise the degree of the second 
summand in the left-hand member would not be less than the degree 
of the product g (x) f (x), and since the degree of the first summand 
is less than the degree of this product, the entire left side would 
have a degree greater than or equal to the degree of g (x) f (x), whe- 
reas the polynomial d (x) is definitely (given our assumptions) of 
lower degree. 

This proves the theorem. At the same time we see that if the 
polynomials f (x) and g (x) have rational or real coefficients, then 
we can also choose the polynomials u (z) and v (x), which satisfy 
(3), so that their coefficients are rational or real. 
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Example. Find the polynomials u (x) and v (z) which satisfy (3) for 
f (x) = z? — z? + 3z — 10, g(x) = z? + 6r? — 9z — 14 


Apply Euclid’s algorithm to these polynomials. This time, when perfor- 
ming the divisions, we cannot allow for any distortion of the quotients since 
these quotients are used to find the polynomials u (z) and v (z). We obtain 
the following system of equations: 


f (z) = g (z) + (—Tr? + 122 + 4), 
g (=T + 12244) (—F2—F) 452 a2), 
—Tz? + 122+ 4 = (z — 2) (—7z — 2) 


Whence it follows that (f(z), g (z)) = z — 2 and that 


7 54 7 5 
u (2) = 935 tog» Y(t) = aaa 


Applying the above-proved theorem to relatively prime polyno- 
mials, we get the following result. 

The polynomials f (x) and g (x) are relatively prime if and only 
if it is possible to find polynomials u (x) and v (x) such that satisfy 
the equation 


f (z) u (z) + g (2) v (z) = 1 (4) 


Proceeding from this result, we can prove a number of simple 
but important theorems on relatively prime polynomials: 

(a) If a polynomial f (x) is relatively prime to each of the polyno- 
mials ọ (x) and 4 (x), then it is also relatively prime to their product. 

Indeed, by (4), there are polynomials u (x) and v (x) such that 


f (z) u (x) + 9 (z) v (z) = 4 
Multiplying this equation by 4p (z), we get 
f (x) lu (x) p (x)] + Ip (z2) ẹ (z) v (z) = 4 (2) 


whence it follows that any common divisor f (x) and ọ (z) > (z) 
would also be a divisor of w (zx); however, it is given that 


(f (z), p (2)) = 1. 
(b) If the product of the polynomials f (x) and g (x) is divisible by 
@ (x). but f (x) and ọ (x) are relatively prime, then g (x) is divisible 
by ọ (2). 
This is true since by multiplying the equation 
| f (2) w (2) + 9 (2) v (2) =i 
by g (z), we get : 


[f (x) g (x) u (x) + @ (z) [v (x£) g (z)] = g (x) 
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Both terms of the left-hand member of this equation are divisible 
by ọ (z); hence g (x) is divisible by ọ (z). 

(c) If the polynomial f (x) is divisible by each of the polynomials 
@ (x) and p (x), which are relatively prime, then f (x) is alsa divisible 
by their product. 

Indeed, f (x) = ọ (x) ọ (z) so that the product on the right is 
divisible by w(x). Therefore, by (b), ọ (x) is divisible by » (z), 
p (z) = Y (x) p (z), whence f (z) = [p (z) p (2)l 4 (2). 

The definition of greatest common divisor may be extended to 
the case of any finite system of polynomials: the greatest common 
divisor of the polynomials f, (£), fa (£), ..., fs (x) is that common 
divisor of these polynomials which is divisible by any other com- 
mon divisor of these polynomials. The existence of a greatest common 
divisor for any finite system of polynomials is a consequence of the 
following theorem, which also provides a procedure for calculat- 
ing it. 

The greatest common divisor of the polynomials f, (£), fa (x), ... 
. . +> fa (x) is equal to the greatest common divisor of the polynomial 
fs (x) and the greatest common divisor of the polynomials f, (x), fa (x), .. - 

«9 Jg-4 (zx). 

Indeed, for s = 2 the theorem is obvious. We thus assume that 
for the case s — 4 it holds true, that is, in particular, we have already 
proved the existence of the greatest common divisor d (x) of the 
polynomials f, (x), fa (x), ..-, fa-1 (x). Denote by d (x) the grea- 
test common divisor of the polynomials d (z) and f, (x). It will 
obviously be a common divisor of all the given polynomials. On the 
other hand, any other common divisor of these polynomials will 
also be a divisor of d (x) and, for this reason, of d (x) as well. 

In particular, the system of polynomials fı (x£), fa (£)... 

. +> fs (x) is called relatively prime if only zero-degree polyno- 
mials are the common divisors of these polynomials; that is to say, 
if their greatest common divisor is unity. If s > 2, then these poly- 
nomials may not be pairwise relatively prime. Thus, the system 
of polynomials 


f (£) = z3 — Tr? + Tx + 15, g(x) TE ETE 00 
h (x) = x? + z? — 122 
is relatively prime, although 
(fF (2), g(@)) =2—5, (f(a), h (a) =z — 3, (g (2), h (2) =2+4 
The reader will readily obtain a generalization. of the above- 


proved theorems (a) to (c) on relatively prime polynomials to the 
case of any finite number of polynomials. 
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22. Roots of Polynomials 


We have already (Sec. 20) dealt with the values of a polynomial 
when we spoke of the function-theoretic approach to the concept 
of a polynomial. Let us recall the definition. 


f(z) = agt” + a271'+...4+ an (1) 
is some polynomial and c is a number, then the number 
f (c) = ao” + ae”! +... +a 


obtained by replacing in (4) the unknown z by the number c and 
by subsequent performance of all indicated operations, is called 
the value of the polynomial f (x) for x = c. Quite naturally, if f (x) = 
= g (x) in the sense of an algebraic equality of polynomials as 
defined in Sec. 20, then f (c) = g (c) for any c. 

It is also easy to see that if 


9 (2) = f (x) + g (z), Y (2) =f (2) g (2) 
p (c) =f (e) +g (e) plc) = f (e) g (c) 


In other words, the addition and multiplication of polynomials 
defined in Sec. 20 become—from the function-theoretic approach 
to polynomials—the addition and multiplication of functions, to be 
understood in the sense of addition and multiplication of the appro- 
priate values of these functions. 

If f (c) = 0, that is, the polynomial f (xz) vanishes when the 
number c is substituted in place of the unknown, then c is termed 
a root of the polynomial f (x) [or of the equation f (x) = 0]. It will 
now be shown that this concept applies completely to the theory 
of divisibility of polynomials, which was the topic of discussion 
in the preceding section. 

If we divide the polynomial f (z) by an arbitrary polynomial 
of degree one (or, as we shall say from now on, by a linear polynomial), 
then the remainder will either be a polynomial of degree zero, or 
zero, which is to say some number r. The following theorem allows 
us to find this remainder without performing the division itself 
when we divide by a polynomial of the form z — c. 

The remainder resulting from the division of a polynomial f (x) 
by a linear polynomial x — c is equal to the value f (c) of f (x) for 
g= c: 


Let 


then 


f (z) = (z — c) q (z) +r 
Taking the values of both sides of this equation when z = c, we get 


fle) = (e —e)g (e) tr =r 


which proves the theorem. 
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An exceedingly important corollary follows from this fact. 

The number c is a root of the polynomial f (x) if and only if f (zx) 
is divisible by x — c. 

On the other hand, if f (x) is divisible by some linear polynomial 
ax + b, then evidently it is also divisible by the polynomial 
z—(—), that is, by a polynomial of the form z — c. Thus, 


finding the roots of a polynomial f (x) is equivalent to finding its linear 
divisors. 

In view of the foregoing, it is of interest to examine the method 
of dividing a polynomial f (x) by a linear binomial z — c, which 
is simpler than the general algorithm for dividing polynomials. 
This method is called the Horner method. Let 

f (a) = agt” + ayz™! + az"? H... F an (2) 
and let 
f(a) =(e—e)q(a)+r (3) 


q (x) = Boz) + dye? + Dax™8 +... Daag 
Comparing the coefficients of like powers of x in (3), we get 
ay = bo, 
a= by == Cbo, 
Qe = bo = chy, 


where 


ln- = On — COn -2, 
an =T — Chy.4 


From this it follows that bọ = ao, b, = cbp-1 + ap, k = 4, 2,... 
.. . n — 1, that is, the coefficient b, is obtained by multiplying 
the preceding coefficient b,-, by c and by adding the corresponding 
coefficient a}; finally, r = cbn-1 +- an, that is, the remainder r, 
which as we know is equal to f (c), is also obtained by the same 
rule. Thus, the remainder and the coefficients of the quotient may be 
successively obtained by computations of the same type, which 
can be arranged in a scheme, as the following examples demonstrate. 

Example 1. Divide f(z) = 22° — 24 — 32? + z — 3 by z — 3. 

Form an array in which the coefficients of the polynomial f (z) are located 
above the bar, and the corresponding coefficients of the quotient and-the rema- 
inder (computed successively) are located below the bar; on the left is the value 
of c in the given example: 

2 —1 —3 0 4 —3 
3 |2.3-2— 1= 5.3-5 —3 = 12.3-12--0=36.3-36-+ 1 = 109. 3- 109— 3 = 324 
Thus, the desired quotient will be. 
q (2) = 224+ 528+ 1222+ 362-+-109 
and the remainder will be r= f (3)=324, 
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Example 2. Divide f (x)= z4 —823-+-22++4r—9 by z-+1. 
1 —8 í 4 —9 
—1;1 —9 10 —6 —3 
The quotient will therefore be . 
g (x) = 23 —9z2-+ 102—6 
and the remainder r= f (—1)= —3. 


These examples show that the Horner method may also be used 
for quick computation of the value of a polynomial for a given value 
of the unknown. 

Multiple roots. If c is a root of the polynomial f (x), i.e., f (ce) = 
= Q, then f (x) is, as we know, divisible by z — c. It may turn out 
that the polynomial f (x) is not only divisible by the first power 
of the linear binomial z — c, but by higher powers of it as well. 
In any case, there will be a natural number k such that f (z) is exact- 
ly divisible by (x—c)*, but is not divisible by (z — c)**1. 


Therefore, 
f (2) = (z — c}* @ (2) 


where the polynomial ọ (x) is no longer divisible by z — c, that 
is, does not have c as its root. The number k is called the multiplicity 
of the root ¢ in the polynomial f (x), and the root c is the k-fold root 
of this polynomial. If k = 1, then we say that the root c is simple. 

The concept of a multiple root is closely related to the concept 
of the derivative of a polynomial. However, we are studying poly- 
nomials with any complex coefficients and for this reason we cannot 
simply take advantage of the concept of a derivative as introduced 
in the course of mathematical analysis. What follows is to be regar- 
ded as a definition of the derivative of a polynomial which is inde- 
pendent of that given in the course of analysis. 

Suppose we have an nth-degree polynomial 


f (2) = ao” + ay” +... + anat + an 
with arbitrary complex coefficients. Its derivative (first derivative) 
is a polynomial of degree n — 1: 
F (2) = nag! + (n — 1) aye * +... + 2an- + ayy 


The derivative of a polynomial of degree zero and the derivative 
of zero are taken to be equal to zero. The derivative of the first 
derivative is called the second derivative of the polynomial f (x) 
and is denoted by f” (x), etc. It is obvious that 


f™ (x) = n!ao 


and therefore f(* (x) = 0; i.e., the (n + 1)th derivative of a poly- 
nomial of degree n is equal to zero. 
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In our case of polynomials with complex coefficients, we cannot 
make use of the properties of a derivative as proved in the course 
of analysis for polynomials with real coefficients; we have to prove 
these properties once again using the definition of a derivative given 
above. We are interested in the following properties, which are 
called formulas for differentiating a sum and a product: 


(f (x) + g (2) =f (2) + g (2) (4) 
(f (x)-g (z))’ = f (2) g' (z) + F (2) g (2), (5) 


These formulas can easily be verified, incidentally, by direct 
computation, by taking for f (x) and g (x) two arbitrary polynomials 
and applying the above definition of a derivative; we leave this 
to the reader. 

Formula (5) can readily be extended to the case of a product 
of any finite number of factors and therefore we can in the ordinary 
fashion derive a formula for the derivative of a power 


(f* (a))’ = kf- (x) f' (2) (6) 


Our aim will be to prove the following theorem. 

If the number c is a k-fold root of the polynomial f (x), then for 
k>1 it will be the (k — 1)-fold root of the first derivative of this 
polynomial; but if k = 1, then c will not be a root of f (x). 


Let l 
f(z) = (z — o) p(x), k>4i | (7) 


where ọ (z) is no longer divisible by z — c. Differentiating equa- 
tion (7), we get 


f (2) = (£ — c} g' (z) + k (z — 0c)" @ (2) 
= (z — c)"* [s — c) ẹ' (1) + kọ (2)! 


The first term of the sum in the square brackets is divisible by 
x — c, the second is not divisible by x — c; therefore, the whole 
sum is not divisible by z — c. Taking into account that the quotient 
of f (z) by (x —c)*-! is uniquely defined, we find that (x — c)*“ 
is the highest power of the binomial z — c which divides the poly- 
nomial f’ (x). The proof is complete. 

Applying this theorem several times, we find that the k-fold 
root of polynomial f (x) is the (k — s)-fold root in the sth derivative 
of this polynomial (k > s) and for the first time will not be a root of the 
kth derivative of f (x). 


23. Fundamental Theorem 


In examining the roots of polynomials in the preceding section 
we did not pose the question of whether every polynomial possesses 
roots. We know that there are polynomials with real coefficients 
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- that do not have real roots; z? + 4 is such a polynomial. It might 
be expected that there are polynomials which do not have roots 
even in the class of complex numbers, particularly if we consider 
polynomials with arbitrary complex coefficients. If this were the 
case, then the system of complex numbers would require a further 
extension. Actually, however, the following fundamental theorem 
of the algebra of complex numbers is valid. | 

Every polynomial of degree at least one with arbitrary numerical 
coefficients has at least one root, which in the general case is complex. 

This theorem is one of the greatest attainments of the whole 
of mathematics and finds application in the most diverse spheres 
of science. In particular, it is the starting point of everything in the 
theory of polynomials with numerical coefficients and for this 
reason it was once called (and sometimes still is) the “fundamental 
theorem of higher algebra”. Actually, however, the fundamental 
theorem is not purely algebraic. All its proofs—and since Gauss 
first proved the theorem at the end of the eighteenth century a very 
large number have been found—are forced, in one degree or another, 
to make use of the so-called topological properties of the real and 
complex numbers, that is properties associated with continuity. 

In the proof which we now give, the polynomial f (x) with com- 
plex coefficients will be regarded as a complex function of a complex 
variable z. Thus, z can assume any complex values, or, taking 
into account the mode of constructing complex numbers given 
in Sec. 17, the variable x ranges over the complex plane. The values 
of the function f (x) will also be complex numbers. We may consider 
that these values are plotted on a second complex plane, as in the 
case of real functions of a real variable where the values of the 
independent variable are plotted on one number line (axis of abscis- 
sas) while the values of the function are plotted on the other line 
(axis of ordinates). 

The definition of a continuous function as given in the course 
of mathematical analysis is carried over to functions of a complex 
variable (in the formulation of the definition, absolute values are 
replaced by moduli). 

Namely, the complex function f (x) of a complex variable z is 
continuous at a point xo if for any positive real number e there is 
a positive real number 6 such that no matter what (generally speak- 
ing, complex) the increment h, the modulus of which satisfies the 
inequality | k | < ô, the inequality 


| f (zo + h) — f (zo) | < e 


holds true. A function f (x) is called continuous if it is continuous 
at all points zo at which it is defined, that is, if f (x) is a polynomial 
on the entire complex plane. 
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The polynomial f (x) is a continuous function of the complex vari- 
able x. 

The proof of this theorem could be given as it is in the course 
of mathematical analysis, namely, by showing that the sum and 
the product of continuous functions are themselves continuous and 
then noting that a function which is constantly equal to one and 
the same complex number is continuous. However, we shall take 
‘a different approach. 

We first prove the particular case of the thesrem when the con- 
stant term of the polynomial f (x) is zero; and we will only prove 
the continuity of f (x) at the point x) = 0. In other words, we will 
prove the following lemma (in place of h- we write z). 

Lemma 1. If the constant term of the polynomial f (x) is zero 


f (x) = agt” + ayw™ +... + aneit 


that is, f (0) = 0, then for any e œ> O there is a 8 >0 such that for 
all x for which |x |< ô it is true that | f (x) | < e. 
Indeed, let 


A= = max (| a l; | a4 Ly ii | an- l) 
We are already given the number g. Let us show that if for the num- 


ber 6 we take 


§=a55 (1) 


then it will satisfy the required conditions. 
Indeed, 


IF |< lalla P [a l a n + lana a 
SA (zP +Izr... tiz) 


that is, 
|x |—|2|"*+ 
ea | 
Since |z |< 6 and, by (1), ô< í, it follows that 
| |z|—|z n | x | 
41—|z| 1—|z| 
and: therefore 
Ao. 
Battal Ad ‘A 
ae 


which completes the proof. 
Let us now derive the following formula. Suppose we have the 
polynomial 


f(x) = apr” + age +... + anat + ap 


| 
| 
i 


23, FUNDAMENTAL THEOREM 145 


with arbitrary complex coefficients. Substitute in place of zx the 
sum x -+ h, where k is the second unknown. Using the binomial 
theorem, expand each of the powers (x + h)”, k <n, in the right- 
hand member and collect terms with like powers of h. This yields 
(as the reader can readily verify) the equation 


jth) = flath (a) +2 r @) +... + (a) 


In other words, we prove Taylor's formula, which gives the expan- 
sion of f (x + hk) in powers of the “increment” h. 

The continuity of an arbitrary polynomial f (x) at any point zo 
is now proved as follows. By Taylor’s formula, 


f (xo + h) — f (xo) = cih + ch? +... + enh" = @ (h) 


where 
r s 1 n 
=f (£o), C3 = -5p Í (£o), ...9 Cn =r f™ (£o) 


The polynomial p (k) in the unknown k is a polynomial without 
a constant term, and so, by Lemma 1, for any e > 0 there is a ô > 0 
such that for |k |< ô it is true that | ọ (h) | <e, i.e., 


| f (to + h) — f (20) | < € 


which completes the proof. 
From the inequality 


II f (zo + h) | — IF (zo) II < If (z0 + h) — f (2o) | 


based on formula (13), Sec. 18, and from the continuity, just proved, 
of a polynomial there follows the continuity of the modulus | f (x) | 
of the polynomial f (x); this modulus is obviously a real nonnegative 
function of the complex variable z. 

We shall now prove the lemmas that are used in the proof of the 
fundamental theorem. 

Lemma on the modulus of the highest-degree term. If we have 
an nth-degree polynomial, n > 1, 


f (x) = ayz” + aa! + agt? +... +a 
with arbitrary complex coefficients and if k is any positive real number, 


then for sufficiently large (in modulus) values of the unknown x t 
inequality , 


| aox” | > k laa + aa? +...+a,] ~~ (2) 
is true, that is, the modulus of the highest-degree term is greater than 
the modulus of the sum of all the remaining terms; it is an arbitrary 
number of times greater. 


10—5760 
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Indeed, let A be the largest of the moduli of the coefficients 
4, Ag, s, Ay: 

l A = max (|a |, | a |, ... | a& J) 
Then (see, in Sec. 18, the properties of the moduli of a sum and 
a product of complex numbers) 


lat” + agr™ 2+... Han |<] a || r] a]l e |" 


+.. tjan salep nnn A EE 


|z]— 


Assuming |z |> 1, we get 
|z|n—1 |x|" 


[z}—1 S Tei 


whence 
Edk 


|e|—1 


|ua" Hat" +... HHan) A 


Thus, inequality (2) will be fulfilled if x satisfies the condition 
|x | >í and also the inequality 
pa el <la" =la lle] 
that is, if 
kA 
It|>T 7+ 1 (3) 


Since the right side of inequality (3) is greater than 1, it may be 
asserted that, for values of x satisfying this inequality, inequality 
(2) holds true. This proves the lemma. . 

Lemma on the increase of the modulus of a polynomial. For 
every polynomial f (x) of degree not less than unity with complex coef- 
ficients, and for any arbitrary large positive real number M, it is pos- 
sible to find a positive real number N such that for | x| > N it pane be 
true that |f (x)| > M. 

Let 


f (x) = ag” + ayz™!1= +... +a, 
By formula (11), Sec. 18, | 
ree ier a | 

> |a | — lar tH... +a] (4 

Apply the lemma on the modulus of the highest- oe term, putting 
k = 2; there is a number N, such that for | z | > N, it is true that 

| aoz” | > 2 lage” +... +a | 
whence 

fa,z™ tt... +a, | <$ laz" | 


23, FUNDAMENTAL THEOREM 147 


_ that is, by (4), 
| f(x) |> | aoz” — + | aon” |= $ |as" | 
The right side of this EE is greater than M for 
2M 
RRT, | ao | 
Thus, for | z| > N = max (Ni, N) we have |f (x) | > M. 

The meaning of this lemma may be illustrated geometrically 

(we will frequently make use of this illustration). Suppose that at 
every point zo of the complex plane a per- 
pendicular is erected whose length (for 
the given scale unit) is equal to the mo- 
dulus of the value of the polynomial 
f{(@) at this point, that is, is equal to 
—|f (xo)|. The endpoints of the perpendi- | 
culars will, in view of the above-proved ———#H+-+———____ 
continuity of the modulus of a polyno- 0 
mial, constitute some continuous curved Fig. 8 
surface situated above the complex plane. 
The lemma on the increase of the modulus of a polynomial shows that 
as | zo | increases this surface recedes from the complex plane, though 
quite naturally the recession is not in the least monotonic. Fig. 8 
is a schematic view of the line of intersection of this surface with 
a plane perpendicular to the complex plane and passing through 
the point O. 

The following lemma plays a crucial role in the proof. 

‘D’Alembert’s lemma. If for x = xo the polynomial f (x) of degree 
n,n > 1, does not vanish, f (x9) = 0 and therefore |f (xo) | > 0, then 
it is possible to find an increment h (complex in the general case) such 


that 
| f (to +h) |< If (zo) | 


If the increment k is as yet arbitrary, then Taylor’s formula 
yields 


f (to +h) = f (t) + hf’ (z) +r rr (z) +.. ae (£o) 


By hypothesis, zo is not a root of f ~ It may, however, fortuitously 
be a root of f’ (x) and perhaps also of certain other higher deriva- 
tives. Let the kth derivative (k > 1) be the first that does not have 
xo for a root, that is, 


F (Xo) = F" (to) =... = FEY (zp) =0, f (2) 0 
Such a k exists since if a is the leading coefficient of the polynomial 
f (x), then 


f™ (a) = nla #0 
; 10* | 
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Thus, 
f (zo +A) =F (20) H 1 a) + age FOP (0) Fo EF (za) 


Some of the numbers fP*) (29), oe ey FOOD (xo) may also be zero, 
but this does not affect our reasoning in any way. 
Dividing both sides of the equation by f (xo), which, by hypothe- 
sis, is different from zero, and introducing the notation 
___ FP (x) _ 
aena j=k, k+1, ..., 7 
we get 
h 
ia = 14 chë + cry hh +... enh” 
or, because c} 0, 
h , 
URAB a patta (Bta ob ar) 
Taking moduli, we get 
f (to-+h) k k 
Fg |= bir erh |+| crk" | (5) 
Up to this point we have not made any assumptions concerning 
the increment k. Now we will choose h: we choose the modulus and 
he argument separately. We choose the modulus of k in the follo- 
wing manner. Since 


Ck+4 i En n-k 
Fa h+... + F h 


Ck+1 Cn pn—k 
E T h 


is a polynomial in h without the constant term, it follows by Lemma 1 
(setting e = z) that there is a 6, such that for |h |< 6, it will 
be true that 


Ck+i c n-k 1 
Ath... Hih <> (6) 
On the other hand, for 
[h] < ô=% [en [7 
we have 
leh | <1 (7) 
Assume that the modulus of k is chosen in accord with the inequality 
|h |< min (6;, 82) (8) 
Then, because of (6), inequality (5) becomes the strict inequality 
ee | ky ty pk 
[LEA | <4 erh" |-+5 | cin (9) 


We will use Condition (7) later on. 
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To choose the argument of h we require that the number c,h* 
' be a negative real number. In other words, . 

an . arg (c,h") = arg cp + kargh =n 

whence 4 
In this choice of h, the number cph? will differ from its absolute value 
in sign: 


; c,h* = =| c,h? | 
bud therefore, using inequality (1); 
[1 + cph? | = [4 — lerh" | | = 1 — | cerh" | 


- Thus, for k chosen on the basis of the Conditions (8) and (10), 
inequality (9) takes the form » 


| ee |<1- enh + 4 EETA 


Aad all the more so 


F(to+h) |_| f(to-+) | 
ee Fe | PADIN <í 
whence it follows that 


| f @o + h) | < |f (zo) | 


This completes the proof of d’Alembert’s lemma. . 

Using the geometric interpretation given earlier, we can describe 
d’Alembert’s lemma in the following fashion. Given that |f (zo) | > 
> 0. This means that the length of the perpendicular erected to the 
complex plane at point x) is nonzero. Then, by d’Alembert’s lemma, 
there is a point x, = x + h such that |f (z) | < |f (ro) |; that is, 
the perpendicular at the point z, will be shorter than at the point 
zo and, consequently, the surface formed by the endpoints of the 
perpendiculars will at this new point be somewhat closer to the 
complex plane. As the proof of the lemma shows, the modulus 
of h may be taken as small as we wish; in other words, the point z; 
may be chosen arbitrarily close to the point zo. However, we will 
not take advantage of this remark in the future. 

Obviously, the roots of the polynomial f (x) will be those com- 
plex numbers (or those points of the complex plane) at which the 
surface formed by the endpoints of the perpendiculars touches this 
plane. It is impossible to prove the existence of such points by 
relying on d’Alembert’s lemma alone. Indeed, using this lemma it is 
possible to find an infinite sequence of points Zo, Zi, Zas .- +; 
such that 


|f (20) [> If @dl> If @)l>.-- ~ (41) 
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However, it does not follow from this that there exists a point z 


such that f (x) = 0, all the more so that the decreasing sequence 
of positive real numbers (141) does not necessarily have to tend 
to zero. l 

The considerations that follow are based on a theorem from the 
theory of functions of a complex variable that generalizes the 
Weierstrass theorem, which is familiar to the reader from the course 
of mathematical analysis. It has to do with real functions of a com- 
plex variable, that is with functions of a complex variable that 
take on only real values. The modulus of a polynomial is an instance 
of such functions. For the sake of simplicity, in the statement of 
this theorem we will speak about a closed circle E to be understood 
as a circle in the complex plane with all boundary points included. 

If a real function g (x) of a complex variable x is continuous at all 
points of a closed circle E, then there exists in E a point xo such that 
for all x in E the inequality g (x) > g (20) holds. Consequently, the 
point xo is the minimum point of g (x) in the circle E. 

The proof of this theorem is given in all courses of complex 
function theory and so we omit it. 

We confine ourselves to the case when the function g (z) is non- 
negative at all points of H—only this case is of interest to us—and 
will try to explain this theorem geometrically with the aid of the 
illustration used earlier. Draw a perpendicular of length g (x) at 
every point zp of the circle Æ. The endpoints of these perpendiculars 
constitute a piece of a continuous curved surface, and due to the 
closed nature of the circle Æ the existence of minimum points of this 
piece of surface is geometrically clear. This illustration does not 
of course take the place of a proof of the theorem. 

We can now take up the proof of the fundamental theorem itself. 
Let there be given a polynomial f (x) of degree n, n > 1. If its 
constant term is a,, then obviously f (0) = a,. Let us apply to our 
polynomial the lemma on the increase of the modulus of a polyno- 
mial, assuming M =| f (0)| = | an|. Consequently, there exists 
an N such that for |x| >> N it will be true that |f (z) | > |f (0)|. 
It is then obvious that the above-indicated generalization of the 
Weierstrass theorem is applicable to the function |f (x) | for any 
choice of the closed circle Æ. For Æ we take a closed circle of radius 
N with centre at 0. Let point xz) be the minimum point of |f (zx) | 
in E; whence, in particular, it follows that |f (zo) | < | f (0) |. 

It is easy to see that 29 will actually serve as minimum point of 
| f (x) | over the entire complex plane: if the point z’ lies outside E, 
then | zx’ | > N and for this reason 


| f (2°) 1 > 1f 0) | > If @) | 


Whence it follows, finally, that f (x) = Q, or that xo serves as a root 
of f (x). If we had had f (zo) = 0, then, by d’Alembert’s lemma, there 
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would be a point z, such that | f (z4) |<. | f (ao) |- However, this- 
contradicts the property of point zọ that we have just established. 

Another proof of the fundamental theorem will be given in 
Sec. 55. 


24. Corollaries to the Fundamental Theorem 
Suppose we have a polynomial of degree n, ni, 
f (x) = agr” + yz +... F anet +a, l (1) 


with arbitrary complex coefficients. We again regard it as a formal- 
algebraic expression which is fully defined by the set of its coeff- 
cients. The fundamental theorem on the existence of a root that 
was proved in the preceding section permits asserting the existence 
of a complex or real root a, of f (x). Therefore, the polynomial f (zx) 
has the factorization 


f (2) = (z — a) @ (2) 


The coefficients of the polynomial ọ (x) are again real or complex. 
numbers, and therefore ọ (xz) has a root a, whence 


f (x) = (z — a4) (£ — aa) (2) 


Continuing in similar fashion, we arrive—after a finite number 
of steps—at a factorization of the nth-degree polynomial f (z) into 
a product of n linear factors, 


Í (£) = a (£ — a) (© — a)... (£ — An) _ (2) 


The coefficient ap is a result of the following: if we had a coeffi- 
cient b on the right of (2), then after removal of parentheses the 
highest-degree term of the polynomial f (x) would be of the form 
bz”, though in reality, by (1), it is the term aoz”. Therefore, b = dy. 

For the polynomial f (x), expansion (2) is, to within the order of 
the factors, a unique expansion of that type. 

Let there be yet another expansion 


f (x) = ao (z — Bi) (z — Pa) ~- (x — Pn) (3) 
From (2) and (3) follows the equation 
(x — 4) (£ — OM)... (£ — an) = (z.— Ba) (z — Be). - - (z — Bn) (4) 


If the root a, were different from all ĝ;, j = 4, 2, ..., n, then, 
substituting a; in place of the unknown into (4), we would have 
zero on the left and a nonzero number on the right. Thus, every 
root a; is equal to some root Ñ; and conversely. 

From this it does not yet follow that the expansions (2) and 
(3) are coincident. Indeed, there may be equal roots among the 
roots @;, i=1, 2,..., n. For example, let s of these roots be 


152 CH. 5. POLYNOMIALS AND THEIR ROOTS 


equal to a, and, on the other hand, let there be ¢ roots equal to the 
root a, among the roots Bj, j = 1, 2, ..., n. We have to show that 
s=t. 

Since the degree of a product of polynomials is equal to the 
sum of the degrees of the factors, the product of two polynomials 
different from zero cannot be zero. It then follows that if two pro- 
ducts of polynomials are equal, then a common multiple can be can- 
celled from both sides of the equation: if 


f (x) p (£) = g (z) ọ (2) 
and ọ (z) 0, then from 
If (z) — g (2)] 9 (z) = 0 


f (z) — g (z) =0 


f (z) = g (2) 


„Let us apply this to equation (4). If, for instance, s >> t, then 
by cancelling the factor (z — a,)' out of both sides of (4), we arrive 
at an equation whose left side contains the factor z — a, and whose 
right side does not contain it. But it has been shown that this is 
a contradiction, which proves the uniqueness of the expansion (2) 
of the polynomial f (z). 

Collecting like factors, we can write (2) as 


it follows that 


that is, 


| f (x) = a (£ — a4)" (£ — a)" 2. (aw — a)" (5) 
where 


ky + ky +. * ~-tkh=n 


It is now aad that there are no equal roots among the roots 
iy Ao, - - 

We will ise that the number k, of (5), i = 1, 2, ., L, is the 
multiplicity of the root a; in the polynomial f (x). Indeed, if this mul- 
tiplicity is equal to s;, then k; < s;. However, let k; < s;. By virtue 
of the definition of multiplicity of a root of f (x), we have the ẹxpan- 
sion 


anes ta 


Replacing in this expansion the factor ọ (z) by its factorization 
into linear factors, we would get for f (x) a factorization into linear 
factors that is definitely different from (2); in other words, it would 
contradict the above-proved uniqueness of the expansion. 

We have thus proved the following important result. 

Any polynomial f (x) of degree n, n > 1, with arbitrary numerical 
coefficients has n roots if each of the roots is counted to the degree of its 
multiplicity. 
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- Note that this theorem holds true for n = 0 as well, since a poly- 
nomial of zero degree quite naturally has no roots. This theorem 
is not applicable only to the polynomial 0, which has no degree 
and is equal to zero for any value of z. We use this last remark in 
the proof of the following theorem. 

If the polynomials f (x) and g (x) whose degrees do not exceed n have 
equal values for more than n distinet values of the unknown, then 
f (x) = g (2). 

. Indeed, the polynomial f (xz) — g (x) has, by hypothesis, more 
roots than n, and since its degree does not exceed n, the equation 
f(z) — g (x) = 0 must be true. 

Thus, taking into account that there is an infinity of different 
numbers, we can assert that for any two distinct polynomials f (x) 
and g (x) there will be values c of the unknown x such that f (c) =+ 
s+ g (c). Such c may be found not only among the complex numbers 
but also among the real numbers, rational numbers and even the 

integers. 
l Consequently, two polynomials with numerical coefficients 
having different coefficients of at least one power of the unknown z 
will be distinct complex functions of the complex variable z. Finally, 
this proves the equivalence, for polynomials with numerical coefficients, 
of the two definitions of equality of polynomials given in Sec. 20: the 
algebraic definition and the function-theoretic definition. 

The theorem proved above permits us to assert that a polynomial 
whose degree does not exceed n is completely determined by its values 
for any distinct values of the unknown whose number is greater than n. 
Can- these values of the polynomial be specified arbitrarily? If we 
assume that the values of a polynomial are given for n + 1 distinct 
values of the unknown, then the answer is yes: there always exists 
a polynomial of degree not higher than n which takes on preassigned 
values for n + í specified distinct values of the: unknown. 

Indeed, let it be necessary to construct a polynomial of degree 
not: higher than n, which, for values of the unknown qa, ay, .. 

. Qn44 (assumed distinct), takes on,. respectively,. the values 


Cis Cz, > e iG Cran The polynomial will he 
EE E E EE ee 
Ci L— a4 L— üi) (\L— Bias) 22. (Pm Anas 
EOE PE (a;— a1). r) (ai— G44)... (ai — G44) (6) 


Indeed, its degree does not exceed n and the value of f (a;) is equal 
to c;. 

Formula (6) is called the Lagrange interpolation formula. The 
_ term “interpolation” is due to the fact that, using this formula and 
knowing the values of the polynomial at n + 1 points, it is possible 
to compute its values at all other points. 
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Vieta’s formulas. Let there be given a polynomial f (z) of degree 
n with leading coefficient 4, 


Í (2) = 2 + a” + age”? +... + Gat + a (7) 
and let a4, %, ..., @n be its roots (counting multiplicities). Then 
f (z) has the following expansion: 

f (x) = (z — a) (£ — a) ... (£ — an) 


Multiplying out the parentheses on the right, and then collecting 
like terms and comparing the resulting coefficients with the coeffi- 
cients of (7), we get the following equations, called Vieta’s formulas, 
which express the coefficients of a polynomial in terms of its roots: 


a, = —(@% +a,+...+ a), | 
Dag = Q1 + Ag +... + Ayan + Aag +... + Onn 
Ag = —(HyM%_%3 F Hyg, F... F An ehn—1On), 


e >» » @ ò> ò> 8 © è ò © © òè ù òè o @ @ © ò o © ò # 8 ò @ 


F Ailg e e e Anemghn F - as E Ag e e o An, 
an = (—1)” Ag... An 
Thus, the right side of the Ath equation, k = 4, 2, ..., n, con- 
tains a sum of all possible products of k roots taken with the plus 
sign or minus sign, according as k is even or odd. 
For n = 2, these formulas become the familiar (from elementary 
algebra) relationship between the roots and the coefficients of a quad- 


ratic polynomial. For n = 3, that is, for a cubic polynomial, these 
formulas take the form 


Ay = —(a, + Qa + Q3), Ay = 1X + 13 + O3, 43 = —AyO—Q3 


The Vieta formulas simplify writing a polynomial, given its 
roots. For instance, find the fourth-degree polynomial f (x) which 
has the simple roots 5 and —2 and the double root 3. We get 


a, = —(5 — 2 + 3 + 3) = —9, 
ad, = 5-(—2) + 5-3 + 5-3 + (—2)-3 + (—2)-3 + 3-3 = 17, 
a3 = —|5-(—2)-3 + 5-(—2)-3 + 5-3-3 + (—2)-3-3] = 33, 
a, = 5-(—2)-3-3 = —90 | 
and therefore 
f(z) = at — 923 + 17x? + 332 — 90 


If the leading coefficient a, of the polynomial f (x) is different 
from unity, then in order to make use of Vieta’s formulas, it is first 
necessary to divide all the coefficients by ao; this has no effect on 
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the roots of the polynomial. Thus, in this case the Vieta formulas 
yield an expression for the relation of all coefficients to the leading 
coefficient. 

-Polynomials with real coefficients. We now derive some corolla- 
ries to the fundamental theorem of algebra which refer to polyno- 
mials with real coefficients. Actually, it is precisely from these 
corollaries that the great significance of the fundamental theorem 
of the algebra of complex numbers stems. 

Let the following polynomial with real coefficients 


f (x) = agx™ + yz” +... 4+ G42 +a, 
have a complex root a, that is, 
aa” +aa™I? +... + a.40a +a, = 0 


We know that this equation is unaffected by changing all the num- 
bers to their conjugates; but all the coefficients ao, a4, . . ., Qn-14, @n 
and also the number 0 on the right, being real, will remain unchan- 
ged in such a substitution, and we arrive at the equation 


aga” + aa"? +... + aya +a, = 0 

that is, n 
f(a) =0 
Thus, if a complex (but not real) number a serves as a root of a poly- 
_ nomial f (x) with real coefficients, then the conjugate number a will 
also be a root of f (x). 

Consequently, the polynomial f (x) will be divisible by tne 
quadratic trinomial l 

ORE E ae E E E (8) 

whose coefficients, as we know from Sec. 18, are real. Taking advan- 
tage of this fact, we will prove that the roots œ and œ have one and 
the same multiplicity in the polynomial f (x). 

Indeed, let these roots have, respectively, the multiplicities k 


and l and, say, let k >l. Then f (x) is divisible by the /th power 
of the polynomial @ oi 


f@=¢ (2) g (2) 


The polynomial q on as a quotient of two polynomials with real 
coefficients, also has real coefficients; but, in conflict with what 
was proved above, it has the number « for its (k — 1)-fold root, 
whereas the number a is not one of its roots. This means that k = l. 

Now we can say that the complex roots of any polynomial with 
real coefficients are pairwise conjugate. From this fact and from the 
earlier proved uniqueness of expansions of type (2) follows the 
final result. 
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Any polynomial f (x) with real coefficients can be expressed uni- 
quely (to within the order of the factors) in the form of a product of its 
leading coefficient ay and several linear polynomials with real coef- 
ficients—of the form x — a that correspond to its real roots—and 
quadratic polynomials of the form (8) that correspond to pairs of i conju- 
gate complex roots. 

For what follows it will be useful to stress that among polyno-. 
mials with real coefficients and leading coefficient 1, only linear 
polynomials of the form z — @ and quadratic polynomials of the 
form (8) are irreducible (that is, cannot be decomposed into factors 
of lower degree). 


25. Rational Fractions 


The course of mathematical analysis deals with integral rational 
functions (which we have called polynomials) and also fractional 


rational functions. The latter are quotients HD of two integral 


rational functions, where g (z) 0. Algebraic operations are per- 
formed on these functions in accord with the same laws as are used 
to manipulate rational numbers, that is to say, fractions with inte- 
gral numerators and denominators. The equality of two fractional 
rational functions, or, as we will now term them, rational fractions, 
is to be understood in the same sense as the equality of fractions 
in elementary arithmetic. For the sake of definiteness, we consider 
rational fractions with real coefficients. The reader will easily note 
that this whole section can almost literally be extended to the case 
of rational fractions with complex coefficients. 

A rational fraction is in lowest terms (simplified) if the numerator 
is relatively prime to the denominator. 

Any rational fraction is equal to some fraction in lowest terms 
which is uniquely defined to within a zero-power factor common to both 
numerator and denominator. 

Indeed, any rational fraction may be reduced by dividing nume- 
rator and denominator by the greatest common divisor; this yields 
an equivalent fraction in lowest terms. If, moreover, we have two 


f (z) q (z) 
TE and rye +b (a) wat are equal, that is 


t@vp@=e@o@ — (1) 


then it follows from the relative primality of f (x) and g (zx) [by 
Property (b) of Sec. 21] that f (x) divides ọ (z), and from the rela- 
tive primality of ọ (xz) and p (z) that g (x) divides f (x). Thus, 
f (z) = cg (z), and then from (1) it follows that g (x) = ep (z). 

A rational fraction is proper if the degree of the numerator is 
less than the degree of the denominator. If we include the polyno- 


simplified fractions 
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mial zero in the set of proper fractions, then the following theorem 
holds. . 
Any rational fraction may be represented uniquely in the form of 
a sum of a polynomial and a proper fraction. 
f (2) 


. If there is a rational fraction —— and if, dividing the numerator 


g (2) 
by the denominator, we get the equation 
f (2) = g (2) q (2) +r (2) 
where the degree of r (x) is less than the degree of g (x), then it is 
easy to check that 


ae =at 


If we also have the equation 7 

f(z) g (2) 

sw IO+ Te 

whew the degree of ọ (z) is less than the degree of p (x), then we 
obtain the equation 


= S @ (z) g (2\— Y (x) r (2) 
O a bee) 


Since the left-hand side is a polynomial, and the right, as is easily 
seen, is a proper fraction, we get q (x) — q (zx) =O and 


pz) r(z) 
p(z) 8 (2) 
` Proper rational fractions can be studied further. As was pointed 

out at the end of the last section, irreducible real polynomials are 
polynomials of the form z — a, where the number a is real, and 
polynomials of the form z? — (P + 6) « + BB, where B and f are 
a pair of conjugate complex numbers. It is easy to verify that in the 
complex case a similar role is played by polynomials of the form 
x —a, where @ is any complex number. 


f (2) 


A proper rational fraction AE is called a partial fraction if 


its denominator g (x) is a power of the irreducible polynomial p (x), 
g (x) = p" (x), k>1 


and the degree of the numerator f (x) is less than that of p (2). 
The following fundamental theorem holds. 
Any proper rational fraction can be decomposed into a sum of par- 
tial fractions. ğ 
tz) 


g (2) h (2) ° 
where the polynomials g (z) and k (x) are relatively prime, 


(g (z), h (@)) = 1 


Proof. We first consider the proper rational fraction 
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Thus, by Sec. 24, there are polynomials u (x) and v (x) such that 
| g (2) u (2) + h (2) D (a) = 1 
Whence ae | 

| g (x) [u (x) f (a1 + h (z) Iv (z) f @I = f (2) (2) 


Suppose, in dividing the product a (a) f (z) by h (x), we get a remain- 
der u (x) whose degree is less than the degree of h (x). Then (2) may 
be rewritten in the form 


g (2) u (2) + h (2) v (2) = f (2) (3) 


where v (z) is a polynomial whose expression could readily be writ- 
ten. Since the degree of the product g (x) u (x) is less than the degree 
of the product g (z)h (x) and this, by hypothesis, is true for the 
polynomial f (x), it follows that the product h (zr) v (x) also has 
degree less than that of g (x) h (x), and therefore the degree of v (zx) 
is less than that of g (x). From (3) there now follows the equation 


f(e)  _ v(a) y ule) 
g(z) h(a) ee * h@) 


the right member of which is a sum of proper fractions. © 

If even one of the denominators g (z), k (x) can be factored into 
a product of prime factors, then a further decomposition is possible. 
Continuing in the same manner, we find that any proper fraction 
can be decomposed into a sum of several proper fractions, each of which 
has for the denominator a power of some irreducible polynomial. More 
precisely, if we are given a proper fraction a , whose denominator 
can be factored into the irreducible factors 


g (2) = pr (2) pa? (2) ~- pr" (@) 


(of course, one can always say that the leading coefficient of the 
denominator of a rational fraction is unity), and p; (x) = py (z) 
for i + j, then it follows that 

P(e) vle) , walt), url) 

g (x) pi (z) ph (2) pi! (2) 


All the terms on the right of this equation are proper fractions. 
u (x 
, where 


It remains to consider a proper fraction of the form T5 
z 

p (x) is an irreducible polynomial. Applying the division algorithm, 

divide u (x) by p* (x), divide the remainder by p*™? (z), and so on. 
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We arrive at the following equalities: 
u (x) = ps (x) sy (x) + u, (z), 
uy (x) = p*-? (x) sẹ (x) + us (2), 


Ung (£) = p (2) Sp_a(z) + Ur- (2) 


Since the degree of u (x) is, by hypothesis, less than the degree of 
ae oe and the degree of each of the remainders u; (x), i = 1, 2, 
, k —1, is less than the degree of the corresponding divisor 
-i ‘(z), it follows that the degrees of all quotients s, (x), Sa (x), . 
-» Sp-4 (x) will be strictly less than the degree of the polynomial! 
p (x). The degree of the last remainder up-1 (x) is also less than the 
degree of p (x). It follows from the equations obtained that 


u (ey = pe (x) sı (z) + p°? (z) sa (z) +. 
or 1 (£) + ur- (2) 
whence we arrive at the desired representation of the rational frac- 


u (2) 
(z) 


tion ea) 88 a sum of partial fractions: 


u (z) up-1(t) , SR—1 (2) So(t) , sy (x) 
pk (xz) pk (z) t DRL yr feet p? (x) tHE 


The proof of the fundamental ae is complete. It may be 
supplemented by the following uniqueness theorem. 

Every proper rational fraction has a unique decomposition into 
a sum of partial fractions. 

Let some proper fraction be decomposable into sums of partial 
fractions in two ways. Subtracting one of these representations 
from the other and collecting like terms, we get a sum of partial 
fractions identically equal to zero. Let the denominators of the 
partial fractions which constitute this sum be certain powers of 
distinct irreducible polynomials p; (x), Pa (x), . . ., Ds (x) and let 
the highest power of the polynomial p; (x), i = 1, 2, ..., s, which 
is one of these denominators, be pit (x). Multiply both sides of the 


equality at hand by the product p—! (x) pl (x)... pes (x). Then 
er the terms of our sum, except one, become polynomials. The term 
u(x) 
PY(2) 
and whose numerator is the product u (z) p? (xz)... ps (æ). The 
numerator is not exactly divisible by the denominator since the 
polynomial p; (x) is irreducible, and all the factors of the numerator 
are relatively prime to it. Performing division with a remainder, 
we find that the sum of a polynomial and a nonzero proper fraction 
is equal to zero, which is impossible. 


is converted into a fraction whose denominator is p, (x) 
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Example. Decompose into a sum of partial fractions the real proper frac- 


tion Le) where 
g (x) 
f (xz) = 224 — 1023 + Tr? 4+- ár + 3, 


g (x) = 25 — 223 + 2z? — 3z + 2 


It is easy to check that 
g (z) = (z + 2) (z — 1)? (22 + 1) 


Each of the polynomials z + 2, 2 — 1, z? + 1 is irreducible. From the foregoing 
theory it follows that the desired ee should be of the form 
f(z) _ A De+E 
Ze) -tp tat SE z2 i 


where the numbers A, B; C, D and E have still to be found. 
From (4) follows the equation 


f (2) = A (z — 1)? (2? + 1) + B (z + 2) (z? + 1) + C (z + 2) (z — 1) (z341) 
+ Dz (z + 2) (z — 1)? + E (z + 2)(a—1)% (5) 


Equating coefficients of like powers of the unknown z in both members of (5), 
we would get a system of five linear equations in fiveunknowns A, B, C, D, E; 
and, as follows from what has been said, this system has a unique solution. 
However, we will take a different approach. 

Assuming z = —2 in (5), we get the equation 45A = 135, whence 


(4) 


A=3 5 (6) 

Putting x = í in (5), we get 6B = 6, or Ea 
B=141 (7) 
Now, in succession, set z = 0 and z = —1 in (5). Using (6) and (7), we get the 

equations 
— 2C + 2E = —2, 

Lee ec (8) 

whence . E 
D=4 (9) 


Now, finally, set z = 2 in (5). Using (6), (7), and (9), we arrive at the equation 
20C + 4E = —52 

which, together with the first equation of (8), yields 
= —2, = —3 

Thus, 


~ 

— 
& 

— 


4 i — 
g (z) Fg (z— 1)? raat z241 


CHAPTER 6 


QUADRATIC FORMS 


26. Reducing a Quadratic Form to Canonical Form 


The genesis of the theory of quadratic forms lies in analytic 
geometry, namely, in the theory of quadric curves and surfaces. 
It will be recalled that the equation of a central quadric curve 
in a plane, after translating the origin of the rectangular coordinate 
system to the centre of the curve, is of the form 


Ax? + 2Bzy + Cy? = D (1) 
It is also possible to perform a rotation of the coordinate axes through 


an angle a, such that we have the following transformation from 
the coordinates z, y to the coordinates z’, y’: 


cel ia. 


(2) 


Then the equation of our curve in the new coordinates will be of 
“canonical” form: 


y = x' sina + y’ cosa 


A’x'* + C'y'? =D (3) 


In this equation, the coefficient of the, product of unkno wns 2’y’ 
is, thus, zero. The transformation of coordinates (2) may obviously 
be interpreted as a linear transformation of the unknowns (see 
Sec. 13); the transformation is nonsingular since the determinant 
of its coefficients is equal to unity. This transformation is applied 
to the left side of (1) and for this reason we can say that the left 
member of (1) is converted into the left side of (3) by the nonsingular 
linear transformation (2). 

Numerous applications required the construction of a similar 
theory for the case when the number of unknowns is equal to an 
arbitrary n instead of two, and the coefficients are either real or any 
complex numbers. 

Generalizing the expression on the left of (1), we arrive at the 
following concept. 


11—5760 
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A quadratic form f in n unknowns z1, £a; .. ., Zn is a sum, each 
term of which is either a square of one of the unknowns or a product 
of two different unknowns. A quadratic form is called real or complex 
according as its coefficients are real or complex numbers. 

If we take it that like terms in the quadratic form f have already 
been collected, we can introduce the following notations for the 
coefficients of this form: we denote by a;; the coefficient of x}, and 
by 2a;; [compare with (4)!] the coefficient of the product 2,2; for 
i Æj. However, since x,;x; = zjx,, the coefficient of this product 
could be written as 2a;;, that is, the designations we have proposed 
presume the validity of the equality 


Qj, = aj; (4) 
The term 2a;;7,;2; may now be written as 
205 jy = Ap jxyxy F autt 


and the entire quadratic form f may be written in the form of a sum 
of all possible terms a;;x;7;, where i and j independently take on the 
values from 1 to n: 


yd jjXiX; (5) 


In particular, for i = j we have the term a;;23. 

Obviously, we can construct a square matrix A = (a;;) of order 
n out of the coefficients a;;; it. is called the matrix of the quadratic 
form f, and its rank r is called the rank of the quadratic form. If, 
say, r = n, that is, the matrix is nonsingular, then the quadratic 
form f is termed nonsingular too. Due to (4), the elements of matrix A 
which are symmetric about the principal diagonal are equal; that 
is, matrix A is a symmetric matrix. Conversely, for any symmetric 
matrix A of order n there is a definite quadratic form (5) in n 
unknowns having for coefficients the elements of the matrix A. 

The quadratic form (5) may be written differently by using the 
multiplication of rectangular matrices introduced in Sec. 14. Let 
us make the following convention: if we have a square or, generally, 
rectangular matrix A, then A’ will denote the transpose of A. If 
matrices A and B are such that their product is defined, then we 
have the equality 


et 


(AB)! = B'A' (6) 


Thus, the transpose of a product of matrices is equal to the product 
of the transposes of the matrices in reverse order. 

Indeed, if the product AB is defined, then, as may easily be 
verified, the product B’A’ will also be defined: the number of columns 
of matrix B’ is equal to the number of rows of matrix A’. The ele- 
ment of matrix (AB)’ in the ith row and jth column lies in the jth 
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row and ith column of the matrix AB. It is therefore equal to the 
sum of the products of the corresponding elements of the jth row 
of matrix A and the ith column of matrix B, which is to say it is 
equal to the sum of the products of the corresponding elements of 
the jth column of matrix A’ and the ith row of matrix B’. This 
proves (6). 

Note that the matrix A is symmetric if and only if it coincides 
with its transpose, i.e., if . 


A =A 
Now denote by X the column made up of the unknowns: 
Tı 
z£ 
tn 


X is a matrix with m rows and one column. Its transpose is the matrix 
K = (x1, Tos ee ey Ln) 


comprising a single row. 
The quadratic form (5) with matrix A = (a;;) may now be written 
as a product: 


f = X'AX -© 

Indeed, the product AX will be a matrix consisting of one column: 
> Ayr j 
AX = à ta 
2 An jXj 


Multiplying this matrix on the left by the matrix X’, we get a “mat- 
rix” consisting of one row and one column, namely, the right side 
of (5). 

What will happen to the quadratic form f if the unknowns 
Zi, Xo,» ++, Zn in it are subjected to the linear transformation 


wi Dt Qikyh, i= 1, 2, se’ M (8) 


with the matrix Q = (gir)? We will assume here that if the form f 
is real, then the elements of the matrix Q must be real. Denoting 


11* 
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by Y the column of unknowns y1, Yo, - - -: Yn, let us write the linear 
transformation (8) in the form of a matrix equation: 


a X= QY (9) 

whence, from (6), | | 
| X = YQ (10) 

Substituting (9) and (10) into (7), we get | . 

f = Y' (Q'AQ) Y 
or . 
f = Y’BY 
where 


B = Q'AQ 


The matrix B is symmetric since, because of (6), which is 
obviously true for any number of factors, and due to the equality 
A’ = A, which is equivalent to the symmetry of matrix A, we have 


B' = Q'A'Q = Q'AQ =B 


This is proof of the following theorem. D S 

A quadratic form in n unknowns having a matrix A is converted 
(after performing a linear transformation of the unknowns with matrix 
Q) into a quadratic form in new unknowns, the product. Q'AQ serving 
as the matrix of this form.. 

Now assume that we perform a nonsingular linear transforma- 
tion; that is, Q and, therefore, Q’ too are nonsingular matrices. 
In this case, the product Q'AQ is obtained by multiplying matrix A 
by the nonsingular matrices; for this reason, as follows from the 
results of Sec. 14, the rank of this product is equal to the rank of 
matrix A. Thus, the rank of a quadratic form does not change under 
a nonsingular linear transformation. 

By analogy with the geometric problem, indicated at the begin- 
ning of this section, of reducing the equation of a central quadric 
curve to canonical form (3), let us now consider the question of 
reducing an arbitrary quadratic form (by some nonsingular linear 
transformation) to a sum of squares of the unknowns, that is to say, 
to a form where all coefficients of products of distinct unknowns are 
zero. This special form of the quadratic form is called canonical. 
First, let us suppose that a quadratic form f in n unknowns z4, £a, - 
... Zn has already been reduced (via a nonsingular linear trans- 
formation) to the canonical form 


f = biyi + baya +... + Onyh (11) 


where yi, Yo, -- +> Yn are the new unknowns. Some of the coeffi- 
cients bi, bs, ..., bn may of course be zeros. We will prove that 
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_ the number of ‘nonzero coefficients in (11) is invariably equal to the 
rank r of the form f. 
Indeed, since we reached (11) by means of a nonsingular trans- 
formation, the quadratic form on the right of (11) must also be of 
rank r. But the matrix of this quadratic form is diagonal: | . 


b 0 
by 


0 bn 


and a requirement that this matrix have rank r is equivalent to 
‘supposing. that its principal diagonal: contains exactly . r nonzero 
elements. 

We now take up the iroot of the: following findamental thearem 
on quadratic forms. 

Any quadratic form may be reduced to canonical form by means 
of a nonsingular linear transformation: If a real quadratic form is under 
consideration, then all the coefficients of this linear pean may 
be taken to be real. 

This theorem is true for the case of quadratic forms in ońe un- 
known since every such form has the form az’, which is canonical. 
We can therefore carry out the proof by induction with respect to 
the number of unknowns; that is, we can prove the theorem for 
quadratic forms in n unknowns, assuming it proved for forms with 
a smaller number of unknowns. 

Suppose we have the quadratic form 


n. n 
f= Ñ DS aijt (12) 
i=1 j=1 : 
in the n unknowns zi, Zg, ..., 2p. We try to find a nonsingular 


linear transformation that isolates from f a square of one of the 

unknowns, that is, such that reduces f to the form of a sum of this 
square and some quadratic form in the remaining unknowns. This 
is readily achieved if among the coefficients ajii, aaz, ..:, Gnn in 
the principal diagonal of the matrix of the form f there are some 
nonzero coefficients, that is to say, if the square of at least one of the 
unknowns z; enters into (12) with a nonzero coefficient. 

For example, let a,,=4 0. Then it will be easy to see that the 
expression aji (d£ + Q1o%_ + ... + Gina), which is a quadra- 
tic form, contains the same terms with the unknown z; as our form 
f, and so the difference 


f ~~ ay CREZ + liata + er + AinIn)* =g 
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is a quadratic form containing only the unknowns Z, ...; Zn, 
but not z,. Whence 
f= ayy CRET F lity Fea F Anin) -+ g 
If we introduce the designations 


Yr = lt i Aiatz F.a. + inns Yi = Ti 


for i=2,3,..., n (13) 

we obtain , . 
f= aly +g (14) 
where g is now a quadratic form in the unknowns Yai Yay. eae Uns 


Expression (14) is the desired expression for the form f, since it was 
obtained from (12) by a nonsingular linear transformation, namely, 
by a transformation inverse to the linear transformation (13), which 
has ay, for its determinant and is therefore not singular. | 

. However, if we have the equalities a,, = dg, =... = änn = Q, 
then we first have to perform an auxiliary linear transformation 
that leads to the appearance, in our form f, of squares of the un- 
knowns. Since there must be nonzero coefficients among those in 
(12) of this form—otherwise there would be nothing to prove— 
suppose, say, that a = 0, i.e., f is the sum of the term 24,77, 
and of terms such that each contains at least one of the unknowns 
Lgs -e Tne 

Let us now perform the linear transformation 


It will be nonsingular since it has the. determinant 
-4t —1 0...0 
14$10.-.0; . 
0 04... 0| =20 


0 00...1 E 
As a result of this transformation, the term 2a,,x2,7, of our form 
becomes 


2 y%1L_ = Way (21 — Zə) (44 -+ Za) = 204924 => 2.04225 


In other words in form f there will appear the squares of two un- 
knowns at once with nonzero coefficients; what is more, they do 
not cancel with any one of the remaining terms, since each one 
of the latter contains at least one of the unknowns Z3, ..., Zn- 
We are now in the conditions of the case that has already been 
considered; one more nonsingular linear transformation will reduce 
the form f to the form (14). 
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= To conclude the proof, note that the quadratic form g depends 
on a smaller (than n) number of unknowns and for this reason, by 
the induction hypothesis, it is reducible to the canonical form by 
means of a nonsingular transformation of the unknowns yz, y3, . - - 

-, Yn- This transformation, which we regard as a (nonsingular, 
quite obviously) transformation of all n unknowns under which 
yı remains unchanged, consequently reduces (14) to canonical form. 
Thus, by means of two or three nonsingular linear transformations, 
which may be replaced by a single nonsingular transformation 
(their product), a quadratic form f may be reduced to a sum of squa- 
res of the unknowns with certain coefficients. And, as we know, the 
number of such squares is equal to the rank r of the form. If, besides, 
the quadratic form f is real; then the coefficients both in the cano- 
nical. form of f and in the linear transformation which reduces f 
to this canonical form will be real; indeed, both the linear trans- 
formation which is inverse to (13) and the linear transformation 
(15) have real. coefficients. 

The proof of the fundamental theorem is complete. The method 
employed in this proof can be used in specific examples for an actual 
reduction of a quadratic form to canonical form. It is only necessary, 
in place of the induction we used in the proof, to isolate the squares 
of the unknowns successively by the method given above. 


Example. Reduce to canonical form the quadratic form 


f = 2xyx_ — brex3 + Axzx, (16) 


Since there are no squares of the unknowns in this form, we first perform 
a nonsingular linear transformation 


t1 = Yy — yo, T2 = Yy F Y2, T3 = Y3 


1 —4i 0 
A= (: 1 | 
0 01 
f = 2y} — 2y} — 4yiys — 8yays 


Now the coefficient of y} is nonzero, and so we can isolate the square of one 
unknown. Setting 


with the matrix 


This yields 


Z4 = 241 — 2ys, 22 = Yo, 23 = Ys 
that is, performing a linear transformation, the inverse of which has the matrix 


we reduce f to the form 


f= — 222 — 222 — 82,25 
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So far only the square of the unknown z has been isolated, since the form 
still contains the product of two other unknowns. Using the fact that the coef- 
ficient of z2 is nonzero, we again apply the method described above. Performing 


the linear transformation 


ty == 24, to = —2zy — 423, t3 = 23 


the inverse of which has the matrix 


1 0 0 
4 
0 0 -4 


we finally reduce the form f to canonical form: 
| ar mee ir ar l ? 
j=5 iy +e (17) 


The linear transiormation that immediately reduces (16) to (47) will have 
for its matrix the product 


4 i 

z7 2? 
ABC=] 4 A j 

2 2 

0 0 4 


It is also possible, by direct substitution, to verify that the nonsingular 


since the determinant is equal to —+) linear transformation 


4 4 
of aa ty tat Sea, 
44 
s= oy by y ists 
T3 = t3 


converts (16) into (17). 


The theory of reducing a quadratic form to canonical form is 
based on an analogy with the geometric theory of central quadric 
curves but it cannot be considered a generalization of this latter 
theory. Actually, in our theory we are allowed to use any nonsin- 
gular linear transformations, whereas reducing a quadric to canoni- 
cal form is achieved by applying linear transformations of a very 
special kind (2); these transformations are rotations of the plane. 
However, this geometric theory can be generalized to the case of 
quadratic forms in n unknowns with real coefficients. The genera- 
lization, which goes by the name of reduction of quadratic forms 
to principal axes, will be given in Chapter 8. 
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27. Law of Inertia: 


The canonical form to which a given. quadratic. form is reduced 
is by no means uniquely determined: any quadratic form may be 
reduced to canonical form in many different ways. Thus, the quad- 
ratic form f = 22,2, — ££ + 2x32, that was considered in the 
preceding section can, by the following nonsingular linear trans- 
formation, 


x, = ty + 3t, + 2és, 
ty = 4 — ty — 2ta, 
T3 =. tz 
be reduced to the canonical form 
f = 28 + 62 — 82 


which is. different from the earlier obtained TEN 

The question arises as to what these different anona quad- 
ratic forms to which the given form f is reduced have in common. 
As we shall see, this question is closely connected with the following 
` one: under what condition can one of the two given quadratic forms 
be carried into the other by a nonsingular linear transformation? — 
The answer depends on whether we are considering complex or real 
quadratic forms. 

First suppose we are considering arbitrary complex. quadratic 
forms; at the same time, let us assume we admit the use of nonsin- 
gular linear transformations also with arbitrary complex coeffi- 
cients. We know that any quadratic form f in n unknowns having 
rank r can be reduced to the canonical form 


f = cyi + Caz +... + eryr 


where all the coefficients Ci, Cor. + + Cp are nonzero. Using the fact 
that we can take the square root of any complex number, let us 
perform the een a tonsingilar linear transformation: 


zi = V ciy; for i = 1, 2,..., 75 z; = y; forj =r +4, 
It reduces f to the form To # | 
foit+at...t2- (1) 


which is called normal. This is simply the.sum of the squares of r 
unknowns with coefficients equal to unity. 

The normal form depends solely on the rank r of the form f, 
that is, all quadratic forms of rank r can be reduced to one and the 
same normal form (1). Consequently, if forms f and g in n unknowns 
have the same rank r, then we can transform f to (4) and then (4) . 
to g; in other words, there exists a nonsingular linear transformation 
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that takes f into g. Since, on the other hand, no nonsingular linear. 
transformation alters the rank of the form, we arrive at the following 
result. 

Two complex quadratic forms in n unknowns can be carried one 
into the other by means of nonsingular linear transformations with 
complex coefficients if and only if these forms have one and the same 
‘rank. 

It very easily follows from this theorem that any sum of squares 
of r unknowns with any nonzero complex coefficients can serve as the 
canonical form of a complex quadratic form of rank r. 

The situation is somewhat more complicated if we consider 
real quadratic forms and—this is particularly important—if we 
allow only for linear transformations with real coefficients. Now 
not every form can be reduced to (1), since this might require taking 
the square root of a negative number. However, if we now use the 
term normal form of a quadratic form for the sum of squares of seve- 
ral unknowns with coefficients +-1 or —1, then it is easy to show 
that any real quadratic form f may be reduced to the normal form via 
a nonsingular linear transformation with real coefficients. 

Indeed, the form f of rank r in n unknowns can be reduced to 
a canonical form that can be written as follows (the numbering 
of the unknowns may be changed if necessary): 


f= cy +... + Yk — Crta — +. oy, Ok <r 


where all the numbers ci, ..., Cp, Ch+1, -- +) Cr are nonzero and 


positive. Then the nonsingular linear transformation with real 
coefficients 


z = V ciy: for i= 4, 2,..., T, 2; = y; for j=r +44, 
reduces f to normal form: | o 
f=Ha+... +A iy, —...-—2 


The total number of squares here is equal to the rank of the form. 

A real quadratic form may be reduced to normal form by many 
different transformations; however, to within the numbering of the 
unknowns, it can be reduced only to one norma! form. This is demon- 
strated by the following important theorem, which is called the 
law of inertia of real quadratic forms. 

The number of positive and the number of negative squares in the 
normal form to which a given quadratic form with real coefficients 
can be reduced by a real nonsingular linear transformation is inde- 
pendent of the choice of the transformation. 

Indeed, let a quadratic form f of rank r in n unknowns z4, £a, . . 

si be reduced to the following normal form in two ways: 


f=yit...+yh—Yhoui—-.- yu ; 
ee 2 eee o @ 
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Since the transition from the unknowns 2, £3, ..., Zn to the 
unknowns yi, Ye, - + +; Yn Was a nonsingular linear transformation, 
it follows, conversely, that the second set of unknowns will also 
be expressed linearly in terms of the first set with a nonzero deter- 
minant: 


n 
Yi = È) Vists, i=1,2,..., 7 (3) 
s=i 
Similarly, 


-> byrne, j= 4,.2, — (A) 


the determinant of dis. TRDE. again being different from zero. 
The coefficients are real numbers both in (3) and in (4). 
Now suppose that k < 1. Write the system of equalities 


Uy = 0,6 @ 45 Yk = 0, 2341 = 0, ..., | a | re Zn = (5) 


If the left members of these equalities are replaced by their expres- 
sions taken from (3) and (4), we get a system of n — l + k homo- 
geneous linear equations in n unknowns %, Za, ..., Zn. The num- 
ber of equations in this system is less than the number of unknowns. 
For this reason, as we ieee from Sec. 4, our system has a nonzero 
real solution a1, @, .. 

Now in (2) let us koko "all y’s and all z’s by their expressions 
(3) and (4), and then let us substitute for the unknowns the numbers 
Qis Og; - ++, Qn. IÉ for brevity the values of the unknowns y; and 
z; obtained in this substitution are denoted by y; (a) and z; (a), 
then, by (5), (2) becomes 


—yhti (a) — .. — yr (2) =z la) t. tila) (6) 


Since all the coefficients in (3) and (4) are real, all the squares in 
(6) are positive and for this reason (6) implies that all these squares 
are zero, whence follow the equalities 


zı (a) = 0, ..., z (&)=0 (7) 
On the other hand, by the very choice of the numbers a, Qa, ..., Qn, 
Zi41 (2) = 0, ..., Z (2) = 0,..., % (2) = 0 (8) 


Thus, the system of n homogeneous linear equations 
z = 0, bad BN wg lt 


in n unknowns 2, Za, ..., Zn has, by (7) and (8), the nontrivial 
solution a1, Œg, ..., @n; that is, the determinant of this system 
must be zero. This however contradicts the fact that the transfor- 
mation (4) was presumed to be nonsingular. We have the same con- 
tradiction for Z < k, whence follows k = 1 which provés the theorem. 
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The number of positive squares in the normal form to which 
a given real quadratic form f is reduced is called the positive index 
of inertia of this form; the number of negative squares is termed 
the negative index of inertia, and the number of positive indices 
diminished by the numbers of negative indices of inertia is the 
signature of the form f. Clearly, if we are given the rank of a form, 
any one of the three numbers just defined will fully determine the 
other two, and for this reason, we can speak of any one of the three 
numbers in subsequent formulations. 

We now prove the following theorem. 

Two quadratic forms in n unknowns with real coef ficients are ried 
one into the other by real nonsingular linear transformations if and 
only if the forms have the same ranks and the same signatures. 

Indeed, let a form f be carried into å form g by a real nonsin- 
gular transformation. We know that this transformation does not 
alter the rank of the form. Neither can it change the signature, for 
then f and g would reduce to different normal forms, but then f 
would reduce—in conflict with the law of inertia—to both these 
normal forms. Conversely, if the forms f and g have the same ranks 
and the same signatures, then they reduce to one and the same nor- 
mal form and therefore can be carried into one another. 

If we have a quadratic form g in canonical form with nonzero 
A coefficients 


g = biy? F bayz To .+ bry? : (9) 


then the rank of this form is obviously anal to r. Taking advantage 
of the procedure used earlier. of reducing such a form to the normal 
form, it is easy to see that the positive index of inertia of form g 
is equal to the number of positive coefficients in the right member 
of (9). From this and from the preceding theorem we obtain the 
following result. 

A quadratic form f has form (9) as its canonical form if and only 
if the rank of f is equal to r and the positive index of inertia of this 
form coincides with the number of positive coefficients in (9). 

Decomposable quadratic forms. By multiplying any two linear 
forms in n unknowns, 


P = ti + Apt, +... + ünn, tp = bizi -H bata H... Ht Ont, 


we obviously get another quadratic form. Not every quadratic form 
can be represented as a product of two linear forms and we wish 
to derive the conditions under which this occurs, that is, the con- 
ditions under which a quadratic form is decomposable. 


A complex quadratic form f (21, £, -. . In) is decomposable 
if and only if its rank is less than or equal to two. A real quadratic 
form f (£i, Xe, . . +; Zn) is decomposable if and only if either its rank 


does not exceed unity or the rank is equal to two and the signature is zero. 
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Let. us first. consider the product of the linear forms ọ and y. 
If at least one of them is a zero form, then their product will be 
a quadratic form with zero coefficients, which means it has rank 0. 
If the linear forms ọ and wp are proportional, 


~ = eg 


and c <0 and the form q is nonzero, then, for example, let the coef- 
ficient a, be different from zero. Then the nonsingular linear trans- 
formation 


Yi = ya, +... +Opm, Yi =a; fo i=2,3,...,7 
reduces the quadratic form gp to 
ph = cyi 


On the right is a quadratic form of rank 1, and so the quadratic 
form gw has rank 1. Finally, if the linear forms ọ and » are not 
proportional then, say, let 


by bs 
Then the linear transformation 
Ys = t, + aTa Pee i OnZny 


Yo = bızı -+ bot, + vee + b,2n, 
Yi = x fori = 3, 4,..., n 


+0 


will be nonsingular; it reduces the quadratic form gp to 
Pb = yy 


On the right is a quadratic form of rank 2, which in the case of real 
coefficients has a signature of 0. 

Let us now prove the converse. A quadratic form of rank 0 can 
of course be regarded as a product of two linear forms, one of which 
is a zero form. Next, a quadratic form f (a1, £a, ..., £n) of rank 1 
is reduced by a nonsingular linear transformation to 


f =c ¢ 40 
that is, to the form . 
| f = (cys) y 
Expressing yı linearly in terms of z1, ro, ..., Zn, we get a repre- 
sentation of the form f as a product of two linear forms. Finally. 


the real quadratic form f (z4, x2, ..-., Zn) of rank 2 and signature 0 
is reduced by a nonsingular linear transformation to 


f=yi-¥ 
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Any complex quadratic form of rank 2 can be reduced to this same 
form. However, 


yi — Ya = (Ys — Yo) (Ys + Yo) 


but after replacing y, and y by their linear expressions in terms 
of 21, Zo, . - +; Zp, we will have on the right a product of two linear 
forms. This proves the theorem. 


28. Positive Definite Forms 


` A quadratic form f in n unknowns with real coefficients is called . 
positive definite if it can be reduced to a normal form consisting 
of n positive squares, that is, if both the rank and the positive 
index of inertia of this form are equal to the number of unknowns. 

The following theorem enables us to characterize positive definite 
forms without reducing them to normal form or canonical form. 

A quadratic form f in n unknowns zi, Z, . . ., Zn with real coef- 
ficients is positive definite if and only if for all real values of the un- 
knowns, at least one of which is nonzero, the form receives positive 
values. 

Proof. Let the form f be positive definite, i. e., reducible to the 
normal form 

fouttaAte tah (1) 
and let Š 


n 
y= È jt; E A (2) 
j= 2 


with a nonzero determinant of the real coefficients a;;. If we want 
5 substitute, into f, arbitrary real values of the unknowns zı, 

<.. Zn, at least one of which is nonzero, then we can first 
substitute them into (2) and then substitute the values obtained 
for all y; into (1). It will be noted that the values obtained from (2) 
for Y1, Yo, - +--+» Yn cannot all be zero at once, for then we would 
have that the system of homogeneous linear equations 


n 
>) aijxz;=0, i=, 2, 
jel g 


has a nontrivial solution, though its determinant is different from 
zero. Substituting the values found for yi, yo, - . . Yn into (41), we 
get the value of the form f equal to the sum of the squares of n 
real numbers, not all zero. This value will consequently be strictly 
positive. 

Conversely, suppose the form f is not positive definite, that is 
either its rank or the positive index of inertia is less than z. This 
means that in the normal form of f, to which it is reduced, say, by 
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the nonsingular linear transformation (2), the square of at least one 
of the new unknowns, say yn, is either absent altogether or is pre- 
sent with a minus sign. We will show that in this case it is possible 
to choose real values for the unknowns 2, 2, ..., Zn, not all 
zero, such that the value of the form f for these values of the un- 
knowns is equal to zero or is even negative. Such, for instance, are 
the values for 21, zo, ..., Zn Which we obtain when solving, by 
Cramer’s rule, the system of linear equations obtained from (2) for 
Yı = Yo =... = Yn-t = 0, yn = 1. Indeed, for these values of 
the unknowns 2, Za, ..., Zn, the form f is zero if y} does not 
enter into the normal form of f, and is equal to —1 if y} enters into 
the normal form with a minus sign. 

The theorem that has just been proved is used wherever positive 
definite quadratic forms are employed. However, it cannot be used 
to establish from the coefficients whether a form is positive definite 
or not. This is handled by a different theorem which we will state 
and prove after introducing an auxiliary notion. 

Suppose we have a quadratic form f in n unknowns with the 
matrix A = (a;;). The minors of order 1, 2, ..., n of this matrix 
situated in the upper left corner, that is, the minors 


aii Qiz - +--+ At aii Qizg - + - Qin 
a Qi4 ig Qo4 Qoo .. Azk l Qo log eo ee Qon 
119 > 7 © 09 i . 9 © © 99 
: Qo4 Qoo ° . . . . . | . . 
: Aki Bho + =- Ann Ani Ang » ++ Ann 


of which the last obviously coincides with the determinant of mat- 
rix A are called the principal minors of the form f. 

The following theorem holds true. 

A quadratic form f in n unknowns with real coefficients is posi- 
tive definite if and only if all its principal minors are strictly positive. 

Proof. For n = 41, the theorem is true since the form then is az? 
and therefore is positive definite if and only if a > 0. For this rea- 
son, we prove the theorem for the case of n unknowns on the assump- 
tion that it has already been proved for quadratic forms in n — 4 
unknowns. 

Note the following. 

If a quadratic form f with real coefficients constituting a mat- 
rix A is subjected to a nonsingular linear transformation with a real 
matrix Q, then the sign of the determinant of the form (that is, the 
determinant of its matrix) remains unchanged. 

Indeed, after the transformation we obtain a quadratic form 
with the matrix Q’AQ; however, due to | Q’| = | Q|, 


IQ'AQISIC FiALIO LH 14 EIQ - 


that is, the determinant | A | is multiplied by a positive number. 
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Now suppose we have the quadratic form 
C 
f= > Qj 5X jXy 
i, jt 
It can be written as | | 
` i i n—1 . , 
f=9 (24 Lay sees Ena) + 2 2 AinLiln -Fannt (3) 


where ọ is a quadratic form in n — 1 unknowns composed of those 
terms of form f which do not contain the unknown z,. The principal 
minors of the form ọ evidently coincide with all principal minors 
of the form f except the last. 

Let the form f be positive definite. Then the form ọ will also be 
positive definite: if there existed values of the unknowns z, 
Zas +++) Zn-14, NOt all zero, for which the form ọ receives a nonstrictly 
positive value, then, additionally assuming z, = 0, we would also 
obtain, by (3), a nonstrictly positive value of the form f, although 
not all the values of the unknowns 2, £as ..., 2-1; Zn are equal 
to zero. For this reason, by the induction hypothesis, all the prin- 
cipal minors of the form that is, all the principal minors of the 
form f, except the last, are strictly positive. As for the last principal 
minor of f (that is the determinant of the matrix A itself), its posi- 
tivity is a consequence of the following reasoning: because of its 
positive definiteness, form f is reduced by a nonsingular linear trans- 
formation toa normal form consisting of n positive squares. The deter- 
minant of this normal form is strictly positive, and so, by the remark 
made above, the determinant of the form f itself is positive. 

Now let all the principal minors of the form f be strictly positive. 
From this follows the positivity of all the principal minors of the 
form ọ, that is, by the induction hypothesis, the positive definiteness 
of this form. Therefore, there is a nonsingular linear transformation 
of the unknowns x, £a, ..., Zn- such that reduces the form @ 
to a sum of n — 1 positive squares in the new unknowns y1, Yo, - - - 
- ++) Yn- By setting z, = yn, this linear transformation may be 
completed to form a (nonsingular) linear transformation of all the 
unknowns 21, Za, .-., Zn. By (3), form f is reduced by the indica- 
ted transformation to 


n—t 
f= Dd V+? Di binyitn + bnnyà (4) 


The exact expressions of the coefficients b;, are not essential to us 
Since i 


Yi + 2binyiyn = (Yi + binya)? —. binyn 
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it follows that the nonsingular linear transformation 


Zi = Yi + binYn, i= 1, 2,-6.5 R= Í, 
4, = Yn 
reduces the form f by (4) to the canonical form 
n—i 
f= > eit czh (5) 


i=i1 


To prove the positive definiteness of the form f, it remains to 
prove that the number c is positive. The determinant of the form 
in the right member of (5) is equal to c. However, this determinant 
should be positive since the right side of (5) is obtained from f by 
two: nonsingular linear transformations, and the determinant of 
the form f was positive (being the last of the principal minors of 
this form). | 

This completes the proof of the theorem. 


Example 1. The quadratic form 
f = 522 + z3 + 5z% + Ariz — 82423 — 4x2x3 


is positive definite since its principai minors 


5 2—4 
5: PAGE 2 141—2|=1 
C J42 5 


are positive. 
Example 2. The quadratic form 


f = 3x? + 23+ 522 + 4zızz — 82423 — 4rers 
is not positive definite since its second principal minor is negative: 


32 
24 


Note that by analogy with positive definite quadratic forms we 
can introduce negative definite forms, that is, nonsingular quadratic 
forms with real coefficients whose normal form contains only nega- 
tive squares of the unknowns. Singular quadratic forms whose 
normal form ccnsists of the squares of one sign are sometimes termed 
semidefinite. Finally, indefinite quadratic forms are those whose 
normal form contains both positive and negative squares of the 
unknowns. 


= —1 
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CHAPTER 7 
LINEAR SPACES 


29. Definition of a Linear Space. An Isomorphism 


The definition of an n-dimensional vector space given in Sec. 8 
began with a definition of an n-dimensional vector as an ordered 
set of n numbers (n-tuple). For n-dimensional vectors we then intro- 
duced addition and multiplication by scalars, which is what led 
to the concept of an n-dimensional vector space. The first instances of 
vector spaces are collections of vector segments emanating from a 
coordinate origin in the plane or in three-dimensional space. Howe- 
ver, when we encounter such cases in geometry, we do not always 
find it necessary to specify the vectors via their components in some 
fixed system of coordinates, since both addition of vectors and their 
multiplication by a scalar are determined geometrically, irrespec- 
tive of the choice of any coordinate system. Namely, the addition 
of vectors in the plane or in space is accomplished by the paralle- 
logram rule, while the multiplication of a vector by a scalar œ signi- 
fies a stretching of the vector by the factor œ (the direction is rever- 
sed if « is negative). It is advisable to give a “coordinateless” de- 
finition of a vector space in the general case as well. By this is meant 
a definition which does not require specifying vectors by ordered 
sets of numbers. We now give such a definition. This definition is 
axiomatic; nothing will be said about the properties of a separate 
vector, but we will enumerate the properties of operations invol- 
ving vectors. 

Suppose we have a set V. We denote its elements by lower-case 
Latin letters: a, b; c, ....* Now, in set V we define the operation 
of addition, which associates every pair of elements a, b.in. V with 
a. uniquely defined element a + b. in V; called the sum, and the 
operation of multiplication by a real number (scalar); the product aa 
of element a by a scalar a is uniquely defined and belongs to FV. 

The elements of V will be termed vectors, and V itself will be 
called a real linear (or vector, or affine) space if the indicated opera- 
tions have the following properties (I to VIII). 


* In contrast to Chapter 2, here and in the sequel, vectors will be desig- 
nated by lower-case Latin letters, scalars by lower-case Greek letters. 
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I. Addition is commutative: a 4+- b = b + a. l 
II. Addition is associative: (a + b) + c =a + (b + c6). 
ITI. There is a zero element 0 in V which satisfies the condition: 
a + 0 = a for all ain FV. O a 
Using I it is easy to prove the uniqueness of the zero element: if 
0, and 0, are two zero elements, then 


0, + 0, = 0+ 0, = 0, 
whence 0, = 0,. .. o 
IV. For any element a in V there exists an opposite (inverse) ele- 
ment —a, which satisfies the condition: a + (—a) = 0. 


Using II and I, it is easy to prove the uniqueness of the inverse 
element: if (—a), and (—a), are two inverse elements of a, then 


(—a) + [a + (—a),] = (—a), + 0 = (—a),, 
[(—a), + a] + (—a), = 0 + (—a), = (—a), 


whence (—a), = (—a@),. . 
From axioms I to IV we deduce the existence and uniqueness of the 
difference a — b, that is, an element which satisfies the equation 


| b+zr=a (1) 
We can set 
a — b =a + (—b) 


since 
b + [la + (—b)] = [b + (—d] +a = 0 +asa. 
Now if there is an element c such that satisfies (4), 
b+c=a 
then, by adding to both sides an element —b, we get 
. c =a + (—d) 


Axioms V to VIII (cf. Sec. 8) relate multiplication by a scalar 
to addition and to operations involving scalars. Namely, for any ele- 
ments a, b in V, for any real numbers a, ĝ, and for the real number 1, 
the following equalities must hold: 
~ V. æ (a + b) = aa + ab, 

VI. (a + p) a = aa + Ba, 

VIL. (aß) a = @ (Ba), 

VIII. 1-a =a. 
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Elementary corollaries to these axioms are: 

(4] a-0=0 

For some a in V, 

a aa =a (a + 0) = aa $00 | 
that is 
a0 = aa — aa = aa + [—(aa)] = 0 

[2) © O a= 0 

where the zero on the left is the number zero and the zero on the 


right is the zero element of V, 
To prove this, take any scalar a. Then 


«a = (a + 0) a = aa + 0-a 
whence 
0-a = aa — aa = 
[3] If aa = 0, then either a = 0 or a =Q. 
If a 0, that is the scalar a-! exists, then 
a = 1.a = (xa) a = a™! (aa) = at.0 = 0 
[4] a(—a)= —aa 
Indeed, 
aa +a (—a) = a la. + (—a)] =a0=0 
that is, the element a (—a) is the inverse of aa. 
[5] (—a)a = —aa 
Indeed, Lo , 
aa + (—a) a = la + (—a)l a = 0-a = 0 
that is, the element (—ca) a is the inverse of aa. 
[6] a (a — b) = aa — ab 
By [4], 
a (a — b) =a la + (—b)] = aa + a (—b) 
KT SU a 
[7] - (a — p) a = aa — Ba 
Indeed, 
(a — B) a = [a + (—f)l a = ga + (—8) a 
= aa + (—fa) = aa — Ba 
These axioms and their corollaries will be used from now on with- 
out any special reservations. 
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The definition given above is for a real linear space. If we assu- 
med, in V, multiplication not only by real numbers but also by arbi- 
trary complex numbers, then, retaining Axioms I to VIII, we would 
have the definition of a complex linear space. For the sake of defi- 
niteness, we will consider real linear spaces; however, everything 
in this chapter can be extended word for word to the case of com- 
plex linear spaces. | 

Examples of real linear spaces come to mind immediately. They 
include the n-dimensional real vector spaces composed of row vec- 
tors that were studied in Chapter 2, also sets of vector segments 
emanating from a coordinate origin in the plane or in three-dimen- 
sional space if the operations of addition and multiplication by 
a scalar are understood in the geometric sense stated at the begin- 
ning of this section. | 

We also have linear spaces that are infinite-dimensional. Let 
us consider all possible sequences of real numbers; they have 
the form 

G@ = (04, Gay « + «yp By sss) 


We perform operations on sequences componentwise: if 


bE (Bis Bas. cares Pina) 


a + b= (a, + Bi, Og Pauses ay On F Pn aga) 
On the other hand, for any real number y, 
ya = (Yai, VEz... YAn, --.) 


All the axioms from I to VIII are fulfilled, which means we have 
a real linear space. 

Another instance of an infinite-dimensional space is the set of 
all possible real functions of a real variable if the addition of func- 
tions and their multiplication by a real number are to be understood 
as is conventional in the theory of functions, that is, as the addition 
or:multiplication by the number of values of the functions for each 
value of the independent variable. 

‘Isomorphisms. Our immediate aim is to select from all linear 
spaces those which it will be natural to call finite-dimensional. 
First let us introduce a general concept. 

In the definition of a linear space we spoke about the properties 
of operations involving vectors, but we said nothing about the pro- 
perties of the vectors themselves. Thus, it may happen that although 
the vectors of two given linear spaces are quite different as to their 
nature, the two spaces are indistinguishahle from the standpoint of 
the properties of the operations. The exact definition is as follows. 

Two real linear spaces V and V’ are called isomorphic if a one- 
to-one correspondence can be set up between their vectors: every 


then 
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vector a of V is associated with a vector a’ of V’, the image of the 
vector a; different vectors from V possess different. images and every 
- vector in V’ serves as an image of some vector in V; and if in this 
correspondence the image of a sum of two vectors is the sum of the 
images of the two vectors, 


(a+b)'=a' +0" (2) 


and the image of a product of a vector by a scalar is the product 
of the image of the vector by that scalar, 


(aa)’ = aa’ (3) 


The one-to-one correspondence between spaces V and V’ which 
satisfies the conditions (2) and (3) is called an isomorphic correspon- 
dence. 

Thus, the space of vector segments (in a plane) emanating from 
a coordinate origin is isomorphic to a two-dimensional vector space 
made up of ordered pairs of real numbers: we obtain an isomorphic 
correspondence between these spaces if in the plane we fix some sy- 
stem of coordinates and associate with every vector segment an or- — 
dered pair of its coordinates. , 

Let us prove the following property of an isomorphism of linear 
spaces: the image of zero of the space V is the zero of the space V' in 
an isomorphic correspondence between V and V’. 

Let a be some vector in V and a’ its image in V’. Then, by (2), 


a’ = (a+ 0)' =a +0 


That is to say, 0’ is a zero of the space V’. 


30. Finite-Dimensional Spaces. Bases 


As the reader can verify without difficulty, the two definitions 
of linear dependence of row vectors given in Sec. 9, and also the 
proof of the equivalence of these definitions, employ only operations 
on vectors and. for this reason can be carried over to the case of any 
linear spaces. Consequently, in axiomatically defined linear spaces 
we can speak of linearly independent systems of vectors, of maxi- 
mal linearly independent systems, if such exist, and so on. 

If the linear spaces V and V’ are isomorphic, then the system of 
vectors di, Go, ..., Ap in V is linearly dependent if and only if the 
system of their images aj, a, ..., a in V’ is linearly dependent. 

Note that if the correspondence a -> a' (for all a in V) is an 
isomorphic correspondence between V and V’, then the reverse cor- 
respondence a’ —> a will also be isomorphic. It is therefore suf- 
ficient to consider the case when the system 44, dj, . . ., @ is linearly 
dependent. Let there be scalars a4, Œg, ..-, Œr, not all zero, such 
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that 
. : Gd, + Ap, +... + a,a, = 0 


In the isomorphism under consideration, the image of the right 
member of this equation is, as we know, the zero 0’ of space V’. 
Taking the image of the left member and applying (2) and (3) several 
times, we get 


aa, + aoa, +... + arak = 0’ 


Thus, the system aj, a, ..., a is also linearly dependent. 

Finite-dimensional spaces. A linear space V is called finite-di- 
mensional if in it we can find a finite maximal linearly independent 
system of vectors; any such system of vectors will be termed the 
basis of the space V. 

A finite-dimensional linear space can have many different bases. 
Thus, in the space of vector segments in the plane, any pair of vec- 
tors different from zero and not lying on one straight line can serve 
as a basis. Note that so far our definition of a finite-dimensional 
space does not specify whether there can exist, in this space, bases 
consisting of a different number of vectors. What is more, it might 
even be assumed that in some finite-dimensional spaces there exist 
bases with an arbitrarily large number of vectors. Let us investigate 
this situation. 

Suppose a linear space V has a basis 


et, Cor a. es Cn (1) 
consisting ofn vectors. Ifa is an arbitrary vector in V, then from the 
maximality of the linearly independent system (1) it follows that 
a is expressed linearly in terms of the system: 


a = Qes + Qla +... + Onen | (2) 
On the other hand, due to the linear iI of (1), expres- 
sion (2) will be unique for the vector a: if 

a = aei + aa +... + Anen 
then | 
(ay — ai) ey + (te — Qa) ep +... + (an — an) en = 0 
whence 
Qi = Qi, a a ee 
Thus, the vector a is associated one-to-one with the row > 
(Œi, Qo, <., Qn) (3) 


of coefficients of its expression (2) in terms of the basis (1) or, as 
we shall say, the row of its coordinates in the basis (1). Conversely, 
every row of type (3), that is, any n-dimensional vector in the sense 
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of Chapter 2 serves as a row of coordinates in basis (1) for some vector 
of space V, namely, for the vector written in the form (2) in terms 
of the basis (1). 

We have thus obtained a one-to-one correspondence between all 
vectors of the space V and all vectors of an n-dimensional vector 
row-space. We will show that this correspondence, which quite natu- 
rally is dependent on the choice of the basis (1), is isomorphic. 

In space V let us, in addition to vector a, which is expressed in 
terms of the basis (4) in the form (2), also take à vector b whose 
expression in terms of the basis (1) is 


b = Pye, + Bolo H... + Bren 
Then 


a AN te a (gh aies 


that is, the sum of the vectors a and b SN to the sum of the rows 
of their coordinates in the basis (1). On the other hand, 

= (yaa) e1 + (Yaa) ea +.» H. (Pon) En 
that is, to the ne of a vector a by a aia y corresponds the product 
of the row of its coordinates in the basis (1) by the same scalar y. 

The foregoing proves the following theorem. 

Any linear space with a basis consisting of n vectors is isomorphic 
to an n-dimensional vector row-space. 

As we know, in an isomorphic correspondence between linear 
spaces, a linearly dependent system of vectors goes into a linearly 
dependent system and conversely; for this reason, a linearly inde- 
pendent system goes into a linearly independent system. From this 
it follows that in an isomorphic correspondence, a basis goes into a basis. 

Indeed, let a basis e4, e,, ..., €, of a space V go (under an iso- 
morphic correspondence between the spaces V and V’) into a system 
of vectors ej, €n, .. . én of space V’, which, though it is linearly 
independent, is not maximal. Consequently, in V’ we can find a © 
vector f’ such that the system ej, €p, ..., én, f remains linearly 
independent. However, the vector f’ in this isomorphism serves as 
an image of some vector f in V. We find that the system of vectors 
ĉis Egs » - +) En, f must be linearly independent, which is in contra- 
diction to the definition of a basis. . 

Further, we know (see Sec. 9) that in an n-dimensional vector row- 
space, all maximal linearly independent systems consist of n vec- 
tors, that any system of n + 1 vectors is linearly dependent, and 
that any linearly independent system of vectors is contained in some 
maximal linearly independent system. Using the above-established 
properties of isomorphic correspondences, we arrive at the following 
results. 

All bases of a. finite-dimensional linear space V consist of one. and 
the same number of vectors. If this number is equal to n, then V is 
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called an n-dimensional linear space, and the number n is the 
dimension of this space. 

Any system of n+1 vectors of an n-dimensional linear space is 
linearly dependent. 

Any linearly independent system of vectors of an n-dimensional 
linear space is contained in some basis of that space. 

It is now easy to verify that the above-indicated examples of 
real linear spaces—the space of sequences and the space of func- 
tions—are not finite-dimensional spaces: in each of these spaces the 
reader will easily find linearly independent systems consisting 
of an arbitrarily large number of vectors. 

Relationships between bases. We are interested in finite-dimen- 
sional linear spaces. Clearly, when studying n-dimensional linear 
spaces we are actually studying the n-dimensional vector row-space 
that was introduced back in Chapter 2. Earlier, however, we extrac- 
ted one basis from this space, namely, the basis composed of unit 
vectors (these are vectors, one coordinate of which is equal to unity 
and all others are zero), all the vectors of the space were specified 
by the rows of their coordinates in that basis. Now, however, all 
bases of a space have equal status. 

Let us see how many bases can be found in an n-dimensional 
linear space and how these bases are interrelated. 

Suppose in an n-dimensional linear space V we have the bases 

C4, Cas ee. Cn (4) 
and 

erae a en (5) 
Each vector of basis (5), like any vector of the space V, is unambi- 
guously written in terms of basis (4) as 


n 
e= >) Tijej, t= 1, 2, OPE (6) 
The matrix g 
Tit -.- Tin 
T = 
Tni ++.» Tnn 


whose rows are the rows of the coordinates of the vectors (5) in basis 
(4), is called the change-of-basis matriz from basis (4) to basis (5). 

Because of (6), we can write the relationship between bases (4) 
and (5) and the change-of-basis matrix T in the form of a matrix 
equation: 


’ 

ay Tii Tiz e. Tin ĉi 

[d 

ez Tat T22 +++ Tan ez (7) 


En Tni Tne eee Tnn En 
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or, denoting bye and e’, aR the bases (4) ang (5) as columns: 
= Te 


On the other Kand, if T’ is the change-of-basis matrix from (5) 
to (4), then 
= T'e 
whence 
= (TT) e, 


= (TT) e 


or, because of the linear. ae of the bases e and e’, 
TT =TT =E . 
whence 
n e 
This proves that the change-of-basis matriz is always a nonsingular 
matriz. 

Any nonsingular square matrix of order n with real elements can 
serve as a matrix for changing from a given basis of an n-dimensional 
real linear space to some other basis. 

Suppose we have a given basis (4) and a nonsingular matrix T 
of order n. For (5) take a system of vectors for which the rows of 
matrix T serve as the rows of coordinates in basis (4); thus, we have 
equation (7). The vectors (5) are linearly independent (linear depen- 
dence would have implied a linear dependence of the rows of mat- 
rix T, in conflict with its nonsingularity). Therefore, system (5), 
as a linearly independent system consisting of n vectors, is a basis 
of our space, and the matrix T serves as a change-of-basis matrix 
from basis (4) to basis (5). 

We see that in an n-dimensional linear space we can fnd as many 
distinct bases as there are distinct nonsingular square matrices of 
order n. True, here, two bases consisting of the same vectors but 
written in a different order are considered distinct. 

Transformation of vector coordinates. Suppose in an n-dimen- 
sional linear space we have the bases (4) and (5) given with the chan- 
ge-of-basis matrix 7 = (t; E 

= Te. 


Let us find the connection Po the coordinate rows of an aite 
ry vector a in these bases. 
Let 


403, (8) 


wie 
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Using (6) we find 


n n n n 
a= pa a; ( Tijej) = >? (> iti ;) ej 
i=t j=1 j=1 tt 


Comparing with (8) and using the uniqueness of vector notation in 
terms of a basis, we obtain 


Thus we have the matrix equation 
’ , ‘ 
(Xr, Qos >- An) = (Qi Qas ee a On) T 


Thus, the row of coordinates of the vector a in the basis e is equal 
to the row of coordinates of this vector in the basis e' multiplied on the 
right by the change-of-basis matrix from the basis e to the basis e’. 

Whence clearly follows the equation 


(Gis) Qiya e ap On) = (Qi Qar aaay An) 2 


Example, Consider a three-dimensional real linear space with the basis 


£1, Ca, &3 (9) 
The vectors 
ei = 5e; — eg — 2e3, 
e= ey + Ben, (10) 
e3 = —2e +H e+ es 


also form a basis in this space, the matrix 


5 —1 —2 
r= 2 3 o) 
—2 1 1 


serving as the change-of-basis matrix from (9) to (10). We then have 


3-1 6 
m= -2 ii ) 
8 —3 17 


a = e; + 4e, — 6&3 


The vector 


therefore has, in basis (10), the row of coordinates 


S 3—41 6 | 
(af, af, ai) = (1, 4, —4) (-2 1 á) = (—13, 6, —27) 
8—3 17 


or 
a= —413e, + bez — 27eg 
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31. Linear Transformations 


In Chapter 3 we dealt with the concept of a linear transforma- 
tion of unknowns. The concept we now introduce bears the same 
name but is different in character. True, certain relationships could 
be established between these two notions. 

Let there be given an n-dimensional real linear space, which 
we denote by V,. We consider a transformation of this space, that 
is a mapping which takes every vector a of V, into some vector a’ 
of the same space. The vector a’ is called the image of a under the 
given transformation. 

If we use @ to denote the transformation, then the image of vec- 
tor a will be written as ag instead of the more customary ọ (a) or 
pa. Thus, 

a’ = ag 

A transformation ọ of a linear space V, is called a linear transfor- 
mation of this space if it takes the sum of any two vectors a, b into 
the sum of the images of these vectors: 


(a + b) p = ap + bọ (1) 
and the product of any vector a by any scalar œ into the product 
of the image of the vector a by that same scalar a: 

(aa) p = a (aq) (2) 
From this definition, it immediately follows that a linear trans- 
formation of a linear space carries any linear combination of given 


vectors Qi, Qo, ...+, Qp into a linear combination (with the same coef- 
ficients) of the images of the vectors: 


(a4Q4 + Qola T oes + Akap) p 
= Q4 (aP) + 2a (a29) +... + ar (arp) (3) 


Let us prove the following assertion. 
Under any linear transformation ọ of a linear space Vn, the zero 
vector 0 remains fixed, 


Op = 0 


and the image of the inverse of the given vector a is a vector that is inverse 
to the image of a: 


(—a) p = —ag 
ludeed, if b is an arbitrary vector, then, by (2), 
= (0-b) ẹ = 0-(b9) = 0 
On the other hand, | 
(—a) p = [(—4) al p = (—4) (ag) = —ap 
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The concept of a linear transformation of a linear space arose 
as a generalization of the familiar analytic geometry concept of 
the affine transformation of a plane or of three-dimensional space. 
Indeed, conditions (1) and (2) are fulfilled under affine transforma- 
tions. These conditions are also fulfilled for projections of vectors 
on a plane or, in three-dimensional space, on a straight line (or a 
plane). Thus, for example, in a two-dimensional linear space of 
vector segments emanating from the origin of the plane, the trans- 
formation carrying a vector into its projection on some axis passing 
through the origin is a linear transformation. 

Examples of linear transformations in an arbitrary space V, 
are the identity transformation e, which leaves every vector a fixed, 


ae =a 
and the zero transformation œ, which maps every vector a into zero, 
aw = 0 


We will now obtain a survey of all linear transformations of 

a linear space V,. Let 
Cis Case on €n . _ (4) 

be a basis of this space. As we have already done, denote by e the 
basis (4) arranged in a column. Since any vector a of the space V, 
is uniquely represented as a linear combination of vectors of the 
basis (4), it follows, by (8), that the image of vector a with the same 
coefficients can be expressed in terms of the images of the vectors (4). 
In other words, any linear transformation œ of Vn is uniquely deter- 
mined by specifying the images eQ, €s9, .. +, en® of all vectors of 
the fixed basis (4). 

No matter what the ordered system of n vectors of Vn, 


Oi gh i ETE (5) 


there is a unique linear transformation ọ of this space such that (5) 
serves as the system of images of the vectors of basis (4) under this trans- 
formation, . 
a epee, b= A, 2%... 0 (6) 

The uniqueness of the transformation ọ has already been proved; 
it remains to prove its existence. Let us define the transformation ọ 
as follows: if a is an arbitrary vector of the space and 


l n : 
a = 5i Aye; 
i=1 
is ¿its notation in the basis (4), then put 


ap = 2 AiCi (7) 
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Let us prove the linearity of this transformation. If 


n 
= >) Bie: 
is any other vector of the space, then 


(a +5) o= 13) +6) el o= Derbi 


= a ace: 2 Bic; = ag + by 


But if y is any scalar, en 
n © n ! n POES 
(va) p= [E (va) eo] p= È (you) ci=y Di aici =y (a9) 


The correctness of (6) follows from the definition (7) of the trans- 
formation ọ, since all coordinates of the vector e; in the basis (4) 
are zero (except the ith coordinate, which is equal to unity). 
We have thus established a one-to-one correspondence between all 
linear transformations of the linear space V, and all ordered systems 
(5) made up of n vectors of this space. 
However, every vector c; has a definite notation in the basis (4): 


Ci = a Qi jej; i=1, 2, O (A (8) 


' We can form a square matrix of the coordinates of the vector c; 
in the basis (4) 


A= =e (9) 


taking for its ith row the row of coordinates of the vector c;, i= 
= 1, 2, ..., n. Since system (5) was arbitrary, the matrix A will 
be an arbitrary square matrix of order n with real elements. 

We thus have a one-to-one correspondence between all linear trans- 
formations of the space V, and all square matrices of order n; this cor- 
respondence is of course dependent on the choice of basis (4). 

We shall say that the matrix A specifies a linear transformation » 
in the basis (4) or, more succinctly, that A is the matriz of the linear 
transformation ọ in the basis (4). If by ep we denote a column com- 
posed of the images of the vectors of (4), then from (6), (8) and (9) 
there follows a matrix equation which completely describes the re- 
lationships existing between the linear transformation q, the basis e 
and the matrix A specifying the linear transformation in that basis: 


ep = Ae | (10) 


31. LINEAR TRANSFORMATIONS 194 


Let us show how, knowing the matrix A of a linear transforma- 
tion ọ in basis (4), it is possible, via the coordinates of the vector a 
in this basis, to find the coordinates of its image aq. If 


n 
a= > je 
i=1 
then 
n 
ap = a ci (€:9) 
i= 
‘which is equivalent to the matrix equation 
ap = (a4, Gy, «+; On) ep 


‘Utilizing (410) and taking into account that the associativity of 
matrix multiplication is easy to verify when one of the matrices 
is a column made up of vectors, we obtain 


ap = [(a1, Oe, eo- On) Ale 


Whence it follows that the row of coordinates of a vector ag is equal 
to the row of the coordinates of the vector a multiplied on the right by 
the matrix A of the linear transformation q, all in the basis (4). 


Example. Let there be a linear transformation given by the following 
matrix in a basis e, es, €3 of three-dimensional linear space: 


—2 10 
a= (1 5 2) 
0 —4 14 


If 
2 a = 5e, + ez — 2eg © 
then l l 
—2 10 
(5, 4, -»( 4 3 2)= (—9, 16, 0) 
0 —4 í l 
that is, 


ay = —9e, + 16ez 


Relationships between matrices of a linear transformation in 
different bases. Quite naturally, a matrix specifying a linear trans- 
formation is dependent on the choice of the basis. We will show 
what the relationship is between matrices that specify one and the 
same linear transformation in different bases. 

Let there be given the bases e and e’ with change-of- basis 


matrix T, 
= Te (11) 
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and let the linear transformation ọ be given in these bases by matri- 
ces A and A’, respectively, 


ep = Ae, e'g = A'e (12) 
By (11), the second equation of (42) reduces to 
(Te) p = A’ (Te) 
However, 
(Te) p = T (eq) 
Indeed, if (Ti, Tiz » ++) Tin) is the ith row of matrix T, then 
(Tiei + Tiza +--+ + Tinen) P 
= Tit (ep) + Tiz (e29) +- -+ Tin (eng) 
.Hence, by (412), E 
(Te) p = T (eg) = T (Ae) = (TA) e, 
A’ (Te) = (A'T)e 
that is, . 
(TA) e = (A'T)e 
If for at least one ii<i<n, the ith row of the matrix 7A is 
different from the ith row of the matrix A'T, then two distinct 
linear combinations of vectors ei, €z, . . ., €n will be equal to each 
other, which contradicts the linear independence of the basis e. 


Thus, 
TA = A'T 


whence, due to the nonsingularity of the change-of-basis matrix T, 
A’ = TAT, A = TA'T (13). 


Note that the square matrices B and Care called similar if they 
are connected by the equation 


C = Q~BQ 


where Q is some nonsingular matrix. We say that the matrix C is 
obtained from B by a transformation by the matrix Q. 

The equations (13) proved above may be formulated as an impor- 
tant theorem. 

Matrices which represent one and the same linear transformation 
in different bases are similar. And the matrix of the linear transfor- 
mation ọ in the basis e’ is obtained by transforming the matrix of this 
linear transformation in the basis e via. the change-of-basis matrix 
from basis e’ to basis e. 
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Let us piont out that if a matrix A represents a linear transfor- 
mation ọ in the basis e, then any matrix B, similar to A, 
B=Q7AQ 
also represents the transformation ọ in some basis, namely, in the 
basis obtained from e by means of the change- of-basis matrix Q71. 
Operations on linear transformations. Associating to every linear 
transformation of the space V, its matrix in a fixed basis, we obtain 
(as was proved above) a one-to-one correspondence between all li- 
near transformations and all square matrices of order n. It is natural 
to expect that the operations of addition and multiplication of ma- 
trices and also matrix multiplication by a scalar will be associated 
with analogous operations involving linear transformations. 
Suppose we have the linear transformations g and p in a space V,,. 
The sum of these transformations is the transformation @ + p de- 
fined by the equation 
| a (p + p) = a9 + ap. z (14) 


It thus carries any vector a into the sum of its images under the trans- 
formations @ and yp. 

The transformation œ + p is linear. Indeed, for all vectors a 
and b and any scalar a, 


(a+ b)(p +p) {= (a+ b)gp t+ a+ dp 
= ap + bp + ap + bp = a (p +p) olt ¥), 
(aa) (p + p) = (aa) p + (aa) p = a (ap) + œ (ap) 
=a (ap + ap) = æ la (p + p) 


On the other hand, we use the term “product” of linear transfor- 
mations ọ and wp for the transformation gp defined by the equation 


a (g4) = (2) 9 cafe p S) 


that is, the transformation obtained by successive application of the 
transformations @ and yp. ) 
The transformation gp is linear: 


(a + b) (ph) = [(a + b) g] p = (ap + bo) Y4 
= (ap) p + (bp) p = a (pp) + b (G4), 
(aa) (pp) = [(aa) pl p = [a (ag)] p = «æ [(aq) p] = a [a (pp) 


Finally, we use the term “product” of a linear transformation ọ 
by a scalar x for the transformation xq defined by 


a (xq) = x (ap) (16) 


Thus, in the g-transformation of all vectors, the images are multi- 
plied by the scalar x 


13—5760 
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The transformation xq is linear: 
(a + b) (xg) = x [(a + b) pl = x (ap + bẹ) 
SA A Pp a me): + 6 (x9) 
(xa) (xp) = xf (xajp]= xia (ag)] = 
= a[x(2p)] = ajag) 
Let the transformations q and p be given in the basis e4, e», .. 
-» €ny by the matrices A = (a;;) and B = (Bij), respectively, 
ep = Ae, ep = Be 
Then, by (44), 


e: (P +p) = e:p + ep = 2 5 50; +2 Bije; = p> (@ij + Buy) ej 


that is, 
e (p +4) = (A+ Ble 

Thus, the matriz of a sum of linear transformations in any basis is 
equal to the sum of the matrices of these transformations in the same 


basis. 
On the other hand, by (15), 


e: (Pp) = (2:9) P= (> aije = 2i iz (ezp) 
S a (È Birer) = > (> ati Bin) er 
j=1 k=1 k=1 j=1 


that is, 
e (yt) = (AB) e 


In other words, the matrix of a product of linear transformations in 
any basis is equal to the product of the matrices of these transformations 


in the same basis. 
Finally, due to (16), s 


e: (xP) =x (e:p) =x a jj) = a (x0; 5) e; 
= j= 
that is, 
e (xp) = (xA) e 


Consequently, a matrix which in some. basis specifies the product of 
a linear transformation ọ by a scalar x is equal to the product of the 
matrix of the transformation in this basis by the scalar x. 

From the results obtained it follows that operations on linear 
transformations possess the same properties as operations on matri- 
ces. Thus, the addition of linear transformations is commutative 
and associative, while multiplication is associative but is not com- 
mutative for n > 1. For linear transformations there exists unique 
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subtraction. Also note that in linear transformations, the identity 
transformation e plays the role of unity, and the zero transformation ©, 
the role of zero. In any basis, the transformation ¢ is given by the 
unit matrix, and the transir matiga Qis given by the zero matrix. 


32. Linear Subspaces 


A subset L of a linear space V is called a linear subspace of this 
space if it is a linear space with respect to the operations defined 
in V of addition of vectors and the multiplication of a vector by 
a scalar. Thus, in three-dimensional Euclidean space, the collection 
of vectors emanating from the coordinate origin and lying in some 
plane (or on some straight line) passing through the origin is a linear 
subspace. 

For a nonempty subset L of space V to be a linear subspace of V, 
the following requirements must be met. 

1. If the vectors a and b lie in L, then the vector a +- b also belongs 
to L. 

2. If the vector a belongs to L, then the vector aa, for any value of 
the scalar a, belongs to L too. 

Indeed, by Condition 2, the set L contains the zero vector: if 
vector a`belongs to L, then L also contains 0-a@ = 0. Furthermore, 
again by Property 2, L contains a vector a and the inverse vector 
—a = (—1)-a, and therefore, due to Property 1, L also contains 
the difference of any two vectors in L. As to all the other require- 
ments that enter into the definition of a linear space, we can say that 
if they are fulfilled in V, then they will likewise be fulfilled in L. 

. Instances of linear subspaces of the space V are: the space V 
itself and also the set consisting of a single zero vector, the so-called 
zero subspace. A more interesting example is the following: in the 
space V take any finite system of vectors 


ae l l Qi, Qs, een Qr : (1) 
and denote by L the set of all those vectors which are linear combina- 
tions of the vectors of (1). We will prove that L is a linear subspace. 
Indeed, if | 
b = aya, + a0, Heee Faran e= Baai + Boa, +... + Bray 
then E 

b +e = (æ + Bi) ai + (Œ + Ba) aa +... + ar + Br) ar 
that is, the vector b + c belongs to L; also in L is the vector 

yb = (yai) ay + (Yao) az +... + (yer) ar 

for any scalar y 


We say that this linear subspace L is generated by the system of 
vectors (1); in particular, the vectors (1) themselves belong to L. 


13* 
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Incidentally, any linear subspace of a finite-dimensional ‘linear 
space is generated by a finite system. of vectors, for if it is not a zero 
subspace, then it possesses a finite basis. The dimension of the linear 
subspace L is not greater than the dimension n of the space V, itself 
and is equal to n only when L = V,. The dimension of the zero 
subspace is of course the number 0. 

For any k, 0< k< n, in the space V, there are linear subspaces 
of dimension k. It is sufficient to take a subspace generated by any 
system of k linearly independent vectors. 

Let there be given linear subspaces L; and L, in the space V. 
The collection Lo of vectors belonging both to Lı and to L, will 
be a linear subspace, as can readily be verified. It is the intersection 
of the subspaces L; and L,. On the other hand, another linear sub- 
space is the sum L of the subspaces Lı and Ly, or the collection of all 
those vectors in V which can be represented as a sum of two terms, 
one from L, and the other from L». If the dimensions of the subspa- 
ces Ly, Ly, Lo and L are, respectively, di, d}, dọ and d, then the 
following formula holds: 

d = dı + d} — dy (2) 
which is to say that the dimension of the sum of two subspaces is equal 
to the sum of the dimensions of these subspaces diminished by the dimen- 
sion of their intersection. 

To prove this, let us take an arbitrary basis 

li, By, e, ldg (3) 
of subspace Lọ and augment it to obtain the basis 
Ön Asiae ay aiy Ddotis xe ay. Oh (4) 
of the subspace L, and also augment it to obtain the Sang 


li, Ag, - + +5 Gdo» Cdotis + + +) Cay X ; (5) 
of the subspace L,. Utilizing the definition of the subspace i it is 
easy to see that this subspace is generated by the system of' vectors 

Qis Ag, e. #9 Ado: bao+ 1> e.. bay, Cdo-+-13 oe ey ldo (6) 
Formula (2) will thus be proved if we demonstrate the linear inde- 


pendence of system (6). 
Suppose the equation 


dia, + Agha +... + Adgddg T Paotibao+i t - - - + Barbar 
+ YPdotiCdoti +--+ F Parla, = O 
with certain numerical coefficients is true. Then 
d = Qil, F alg +... + Odola + Baotibaoti +--+ + Barbar 
= — YVdo+1Cdo+1 — + + + — Valda (7) 
The left member of this equation lies in Z,, the right member in L,, 
therefore vector d (which is equal both to the left and to the right 
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member of this equation) belongs to Lọ and, consequently, can 
be expressed linearly in terms of the basis (3). However, the right 
` member of (7) shows that the vector d can also be expressed linearly 
in terms of the vectors ca,+1, -< Caz. Whence, by the linear 
independence of system (5), it follows that all the coefficients 
Ydotis ++ ++ Yaz are zero, that is, that d = 0; but then, because of 
the linear independence of system (4), all the coefficients a4, . . ., Qdo 
Bayt» -- - Ba, are also zero. This proves the linear independence 
of system (6). 

The reader can verify that our proof holds true for the case when 
the subspace Ly is a zero subspace, i.e., dọ = 0. 

The range of values and the kernel (null space) of a linear trans- 
formation. Suppose we have a linear transformation ọ in a linear 
space V,. If L is any linear subspace of the space V,, then the col- 
lection Lọ of images of all vectors of L under the transformation @ 
will also be a linear subspace, as follows directly from the definitions 
of a linear subspace and a linear transformation. In particular, the 
collection V,q of images of all vectors of the space V, is a linear sub- 
space. It is called the range of values of the transformation ọ. Let 
us find the dimension of the range. To do this, note that since all 
matrices representing the transformation q@ in different bases are 
similar, it follows, due to the last theorem of Sec. 14, that they all 
have one and the same rank. This number can therefore be termed 
the rank of the linear transformation 9. l 

The dimension of the range of values of a linear transformation © 
is equal to the rank of the transformation. 

Indeed, let p be represented in the basis e4, Co, - +--+, En by the 
matrix A. The subspace V,@ is generated by the vectors 


QP, CoP, >- +) En® (8) 


and therefore, as a particular case, any maximal linearly indepen- 
dent subsystem of system (8) will serve as a basis of the subspace 
Vao. However, the maximum number of linearly independent vec- 
tors in system (8) is equal to the maximum number of linearly inde- 
pendent rows of the matrix A, i.e., it is equal to the rank of the 
matrix. The theorem is proved. 

We know that under the linear transformation ọ the zero vector 
goes into itself. The collection N (@) of all vectors of the space V, 
which under ọ are mapped into the zero vector is consequently non- 
void and is evidently a linear subspace. This subspace is termed 
the null space of the transformation @, and its dimension is called 
the nullity of this transformation. 

For any linear transformation @ of space V,,, the sum of the rank 
and of the nullity of the transformation is equal to the dimension n of 
the whole space. 
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Indeed, if r is the rank of the transformation’ q, ee the sub- 
space Vig has the following basis of r vectors: . 


li, day e ar E (9) 
In V, we can select the vectors 
AD Bes igual : (40) 
such that- : 
big = Qi, = 1, 2, r 


The choice of vectors (10) is not aabo naturally. If some 
nontrivial linear combination -of vectots (40) were mapped into 
zero by the transformation, in particular, if the vectors (40) were 
linearly dependent, then the vectors (9) would: themselves be linearly 
dependent, but this runs counter to our assumption. And so the 
linear subspace L generated by the vectors (10) has dimension r 
and its intersection: with the subspace N (q) is zero. 

On the other hand, the sum of the subspaces Land N (ọ) coin- 
cides with the entire space V,. Indeed, if c is any vector of the space, 
it follows that the vector d = cg of course belongs to the subspace 
V,q. Then in the subspace L there will be a vector b such that 


bp =d 


The vector b is written in terms of system (10) with the same coeffi- 
cients as is the vector d in terms of the basis (9). From this we have 


= b -+ (e — b) 
and the vector c — b is contained in the subspace N (q), since 
(c — b) ọ = co — bọ = d —d = 0 


The assertion of the theorem. follows from the results obtained 
and from the formula (2) that was. proved earlier. — 

Nonsingular linear transformations. A linear transformation @ 
of a linear space V,, is called nonsingular if it satisfies any one of 
the following conditions, the equivalence of which follows directly 
from the theorems proved above. 

1. The rank of the transformation @ is equal to n. 

2. The entire space Vn serves as the aner of values of the trans- 
formation @ 

3. The nullity of the transformation © is. zero. 

There are many other definitions of nonsingular linear transfor- 
mations that are equivalent to those given above, for. instance, 
definitions 4 to 6. 

4. Distinct vectors of the space Vn have distinct images under 
the transformation q. 
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.: Indeed, if a transformation g has Property 4, then the null space 
of this transformation consists of the zero vector alone, i.e., Pro- 
perty 3 holds. But if the vectors a and b are such that a > b, but 
ap = bọ, then a — b 0, but (a — b) ọ = 0, or Property 3 does 
not hold. 

From 2 and 4 there follows 

5. The transformation g is a one-to-one mapping of the space FY, 
onto this whole space. 

From 5 it follows that a nonsingular linear transformation p 
has an inverse transformation @7} which carries any vector ag into the 
vector a, 

(ap) = =a 
The transformation q-! is linear since 
| (ag + bg) p~ = [(a + b) Ql gt = a + b, 
[a (a2g)] p™ = [(aa) p] p-? = aa | 
From the definition of the transformation g~! it follows that 
gp = p7p =e (14) 


The equalities (41) can themselves be viewed as a definition of an 
inverse transformation. Then from this and from the last results of 
the preceding section it follows that if a nonsingular linear transfor- 
mation ọ is represented in some basis by the matrix A (which is non- 
singular due to Property 1), then the transformation @~ is represented 
in that basis by the matrix A™. 

We thus arrive at the following definition of a nonsingular linear 
transformation. 

6. A transformation @ has an inverse linear transformation pol. 


33. ‘Characteristic Roots and Eigenvalues 


“Let A = (a,j) be a square matrix of order n with real elements. 
On the other hand, let à be some unknown. Then the matrix A — AZ, 
where £ is a unit matrix of order n, is called the characteristic matrix 
of the matrix A. Since in the matrix AZ the principal diagonal is 
occupied by A and all other elements are zero, we have 


Oi — A n Qin 
7 Koy Qag TTN À os 2 s a 
A — E = | ak 
Qni , Ane o ee Ann me À 


The determinant of the matrix A — AE is a polynomial in 4 
of degree n. Indeed, the product of elements on the principal dia- 
gonal is a polynomial in A with highest-degree term (—1)" 4"; all 
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the other terms.of the determinant do not contain at least two of the 
number of elements on the principal diagonal; therefore, their degree 
in A does not exceed n — 2. It is easy to find the coefficients 
of this polynomial. For instance, the coefficient of 4”~1 is equal to 
(—4)°-* (Gig, + Gop H... + Zann) and the constant term coinci- 
des with the determinant of matrix A. 

The polynomial | A — AE | of degree n is called the characteri- 
stic polynomial of matrix A, and its roots (which may be real or 
complex) are termed the characteristic roots of the matrix. 

Similar matrices have the same characteristic polynomials, and, 
consequently, the same characteristic roots. 

To see this, let 


B = QAQ 
Then, taking into account that the matrix AE commutes with the 
matrix Q, and |Q |= | Q|}, we have 
|B — AE | = | QAQ — ME | = | Q7 (4 — AE) Q | 
= |Q |7] 4 —\E |Q= 14 —dE| 


The proof is complete. 

From this result it follows (by the theorem proved in Sec. 34 
on the relationship between matrices representing a linear trans- 
formation in different bases) that although the linear transformation 
p may be represented in different bases by different matrices, all the 
matrices have one and the same set of characteristic roots. These roots 
can therefore be called the characteristic roots of the transformation ẹọ. 
The set of these characteristic roots, each root being taken with 
the multiplicity that it has in the characteristic polynomial, is 
called the spectrum of the linear transformation ọ. 

Characteristic roots play a very important role in the study of 
linear transformations, as the reader will have ample opportunity 
to see. We now investigate one of the applications of characteristic 
roots. 

Let there be given a linear transformation ọ in a real linear space 
V,. If a vector b (nonzero) is carried by the transformation into 
a vector proportional to b, 


bo = Aob i (1) 


where ào is some real number, then the vector b is called the eigen- 
vector of the transformation g, and the number Ay is the eigenvalue 
of this transformation. We say that the eigenvector b corresponds 
to the eigenvalue Ay. 

Note that since b =Æ 0, the number A, which satisfies Condi- 
tion (1) is uniquely defined for the vector b. Also bear in mind that 
the zero vector is not considered to be an eigenvector of the trans- 
formation qg, although it satisfies Condition (1) for any Ap. 
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Rotation of the Euclidean plane about the origin through an 
angle that is not a multiple of x is an example of a linear transfor- 
mation which has no eigenvectors. An instance of another extreme 
case is the stretching of a plane in which all vectors issuing from 
the origin are stretched, say, five times. This is a linear transforma- 
tion and all nonzero vectors of the plane are its eigenvectors; all 
of them correspond to the eigenvalue 5. 

Only the real characteristic roots (if they exist) of a linear transfor- 
mation serve as eigenvalues of the transformation. 

Let a transformation ọ have a matrix A = (qa;;) in the basis e,, 
€o, ..., en and let the vector 


b= > Bier 
be an geenna of the transformation @ 
i bp = ob (2) 
a was proved in Sec. 34, 
bp = 1(Bi, Ber ee. Ba) Ale (3) 


Equations (2) and (3) lead to the system of equations 
Bias + Baaai +... + Brani = Aoba 
Baao + Boao +. ~~ + Pnang = doP, 


. . . . . . . . . ` . . a . . . . . (4) 
Bian + Boon Pees oF Bn&nn = AoPn 
Since b 0, not all the numbers a Bo, ..-, Bn are zero. Thus, 
by (4), the system of homogeneous linear equations 
(G4, — Ao) t4 + Qato +... + Anim = 0, 
Oia, T (Beg — Ap) Lg +... + Ongtn = Q, (5) 


Qinti + Amta +... + (Onn — An) % =O 
has a nontrivial solution and for this reason its determinant is equal 
to zero: 


ais — Aò; Qot, .. 3 Xni 
Cis Ding — Aor One 
? 22 ? ’ = 0 (6) 
ins Qons se ty Ann — ho 


Taking the transpose, we get 
|A —Ayk | = 0 (7) 
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that. is to say, the eigenvalue A, actually does prove:to be a charae- 
teristic root (and, quite naturally, a real root). of the matrix A and, 
hence, of the linear transformation g. 

Conversely, let Ay be any real characteristic root of the trans- 
formation. m and, consequently, of the matrix A. Then we have 
equation (7) and therefore equation (6), which was obtained from 
(7) by taking the transpose. From this it follows that the system of 
homogeneous linear equations (5) has a nontrivial solution, and even 
a real one, since all the coefficients of the system are real. it we denote 
this solution by 


(Bi; Pos pea Bn) | (8) 


we have equations (4). Use b to denote the vector of space V, having 
in the basis ei, eg, ..., €n the coordinate row (8). It is clear that 
b = 0. Then equation (3) holds and from (4) and (3) follows (2). 
Thus, vector b has proved to be an eigenvector (of the transforma- 
tion ©) corresponding to the eigenvalue Ay. This proves the theorem. 

Note that if we considered a complex linear space, then the - 
demand that the characteristic root be real would be superfluous. 
In other words, we would have proved the following theorem: 
The characteristic roots of a linear transformation of a complex linear 
space, and only these roots, serve as eigenvalues of the transformation. 
Whence it follows that in a complex linear space, any linear trans- 
formation has eigenvectors. 

Returning to our real case, note that the collection of eigenvectors 
of the linear transformation ọ which correspond to the eigenvalue A, 
coincides with the collection of nontrivial real solutions of the 
system of homogeneous linear equations (5). Whence it follows that 
the collection of eigenvectors of the linear transformation @ which cor- 
respond to the eigenvalue àg will, after the zero vector has been adjoined 
to it, be a linear subspace of the space V,,. Indeed, from what was 
proved in Sec. 42, it follows that the collection of (real) solutions 
of any system of homogeneous linear equations in n unknowns is a li- 
near subspace of the space Vn. 

Linear transformations with a simple spectrum. In many cases 
it is necessary to know whether a given linear transformation ọ can 
have a diagonal matrix in some basis. As a matter of fact, by far 
not every linear transformation can be represented by a diagonal 
matrix. The necessary and sufficient conditions for this will be indi- 
cated in Sec. 61. In the meantime we wish to indicate one sufficient 
condition. 

We will first prove the following auxiliary. results. 

A linear transformation @ is represented by a diagonal matrix 
in a basis ej, €s, . . ., En if and only if all the vectors of the basis are 
eigenvectors of the transformation Q. 
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Indeed, the equation AN 
eip = Aye; 


is equivalent to the fact that in the ith row of the matrix repre- 
senting the transformation ọ in the indicated basis all off-diagonal 
elements are zero and the principal diagonal has the number A, (in 
the ith position). 

The eigenvectors bı, by, ..., by of the linear transformation ọ 
which correspond to different eigenvalues constitute a linearly inde- 
pendent system. 

We shall prove this assertion by induction with respect to k, 
since for k = 1 it holds true: a single eigenvector, being nonzero, 
constitutes a linearly independent:system. Let 


PoS hbe tei se 
and oe 
A AA; for i Æj 
If there exists a linear dependence | 
Qba + Qaba +... + pb, = 0 (9) 


where, for example, a, +0, then, applying the transformation 9 
to both sides of (9), we get 


CAUTI + ohods +... + pApOp = 0 
Subtracting equation (9) multiplied by A, we get 
Oty (Ay — Ap) bi E Qa (Ag — An) ba H- F Qni (Ant — An) Or- = O 


which yields a nontrivial linear dependence between the vectors b4, 
bas... Op-1 Since a, (Ay — Ax) = O. 

We say that a linear transformation of a real linear space V, 
has a simple spectrum if all its characteristic roots are real and di- 
stinct. Consequently, the transformation ọ has n distinct eigenva- 
lues and therefore, by the theorem just proved, the space V, has a 
basis composed of the eigenvectors of this transformation. Thus, 
any linear transformation with a simple spectrum may be represented 
by a diagonal matriz. 

Passing from the linear transformation to the matrix represen- 
ting it, we obtain the following result. 

Any matrix whose characteristic roots are all real and distinct is 
similar to a diagonal matrix, or we say that such a matrix can be re- 
duced to diagonal form (diagonalized). 


CHAPTER 8 


EUCLIDEAN SPACES 


34. Definition of a Euclidean Space. 
Orthonormal Bases 


The concept of an n-dimensional linear space does not by any 
means fully generalize the concept of a plane or three-dimensional 
Euclidean space: in the n-dimensional case, for n> 3, neither 
the length of a vector nor the angle between vectors is defined and 
it is therefore impossible to develop the rich geometrical theory so 
familiar to the reader for n = 2 and n = 3. It turns out, however, 
that we can rectify the situation in the following manner. 

From analytic geometry we know that for two-dimensional 
(a plane) and three-dimensional space we can introduce the concept 
of scalar multiplication of vectors. It is defined by means of the 
lengths of the vectors and the angle between them; it appears, howe- 
ver, that both the length of a vector and the angle between vectors 
can, in turn, be expressed in terms of scalar products. We will 
therefore define the concept of scalar multiplication (we will define 
it axiomatically) for any n-dimensional linear space. This will be 
done with the aid of certain properties which we know the scalar 
multiplication of vectors in the plane or in three-dimensional space 
actually possesses. Considering the immediate reasons for this mate- 
rial being included in the course of higher algebra, we dispense 
with the definitions of the length of a vector and the angle between 
vectors. The reader interested in the construction of geometry in n- 
dimensional spaces is referred to the special literature, in particu- 
lar, to more exhaustive texts on linear algebra. 

The reader should bear in mind that, with the exception of the 
end of this section, the whole chapter deals solely with reai linear 
spaces. 

We shall say that scalar multiplication is defined in an n-di- 
mensional real linear space V, if to every pair of vectors a, b there 
is associated a real number denoted by the symbol (a, b) and called 
the scalar product of the vectors a and b. The following conditions 
are satisfied (here, a, b, c, are any vectors of the space V,, and a 
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is any real number, or scalar): 


OL (a; b) = (b, a). 
II. (a + b, c) = (a, c) + (b, c). 
III. (aa, b) = a (a,. b). 


IV. Ifa 0, then the scalar square of the vector a is strictly 
positive 

(a, a) >0 
- Note that from III we have, for a = 0, the equation 

(0, b) =0 | (1) 


which states that the scalar product of the zero vector by any vector b 
is zero: in particular, the scalar square of the zero vector is also zero. 
l From II and III there immediately follows a formula for the 

scalar product of linear combinations of two systems of vectors: 


k l kR l 
(> Qili, à Bjb) = 2 à aiy (ai, bj) (2) 


If scalar multiplication is defined in an n-dimensional linear 
space, then the space is termed n-dimensional Euclidean space. 

It is possible to define scalar multiplication in an n-dimensional 
linear space V, for any n, which is to say that we can convert this space 
into a Euclidean space. 

Indeed, in V, take any basis e,, eg ,..., en. If 


n n 
a = ba hei, b= >» Bie: 
i=1 i=1 
then put 


n 
(a, b)= pa) oP: (3) 
It is easy to see that Conditions I-IV will be fulfilled, that is, equa- 
tion (1) defines scalar multiplication in the space Vr. 

Generally speaking, we see that in n-dimensional linear space 
it is possible to specify scalar multiplication in many different 
ways. Naturally, definition (3) depends on the choice of the basis, 
but as yet we do not know whether it is possible to introduce scalar 
multiplication in any other fundamentally different manner or not. 
Our immediate purpose is to survey all possible modes of converting 
n-dimensional linear space into Euclidean space and of establishing 
the fact that in a certain sense there is only one n-dimensional Eu- 
clidean space for any n. 

Suppose we have an arbitrary n-dimensional Euclidean space En, 
which means that scalar multiplication has been introduced in some 
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fashion into an n-dimensional linear space. The vectors a and :b 
are orthogonal if their scalar product is zero, 


(a, b =0 


From (4) it follows that the zero vector is orthogonal to any vector; 
however, there can be nonzero orthogonal vectors too. 

A set of vectors is called an orthogonal system if all the vectors 
are pairwise orthogonal. 

Every orthogonal system of nonzero vectors is linearly independent. 


Indeed, let there be a system of vectors a,, d,, ..., @, in En 
and let a; ~0,i=4141,2,...,k and 


If 
X44 at Xalo 4: -+ Apap, = 0 


then by forming the scalar o of both sides of this eHuntion by 
the vector a; 1 <i<hk, we get [by (4), (2) and (4)] 


0 = (0, ai) = (%4Q4 + Apollo a OE OpnQp, a;) 
= O41 (a4, a;) -+ Qs (aa, a; i) + eae Op (ar, 1 


= Q; (Qis ai) E 
Whence, since (ai, a:i) > 0 by IV, it follows that a, =0,i=1, 
2, ..., k, which is what we set out to prove. 


We now describe the orthogonalization process, which is a means 
of passing from any linearly independent system of k vectors 


Ai, Ag, « « oy Ap (5) 
of Euclidean space E, to an orthogonal system, also consisting of k 
nonzero vectors. We denote these vectors by b4, ba, ..., Dp. 
Let us put b, = a, which is to say that the first vector of sy- 
stem (5) will enter into the orthogonal system we are building. 
After that, put 
by = aybi + ay 


Since b; = a, and the vectors a, and a, are linearly independent, 
it follows that the vector b, is different from zero for any scalar a. 
We choose this scalar remembering that the vector b, must be ortho- 
gonal to the vector bı: 


0 = (bi, bo) == (04, 4b, + Gy) = Qi (bi, by) + (b4, a) 
whence, by IV | 


— _ (bna) 
m= — 7) 
Suppose an orthogonal system of nonzero vectors bi, ba, ..., bl 


has already been constructed; we also assume that for any i, 1 < 
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<i <, the vector b; is a linear combination of the vectors a, 
s, ..., a. Then this assumption will also hold for some vector b;4, 
if it is chosen in the form 


bia = aib; + daba Hoe H ajbi E a 


The vector bı}; will then be different from zero, since system (5) 
is linearly independent and the vector a;,, does not enter into the 


notation of vectors b,, bə, .. . bı We choose the coefficients a,, 
b= 1.2, l, from the fact "that the vector b;,, must be ortho- 
gonal to ‘all ‘the vectors b;, i = 1, 2, be 


0.= (b;, bita) = (bi, yb, + Cade Toray a 1b, + 4144) Z | 
= di (bi, ba) + aa (Bis b) H.. Har bn b) 
+ (bi, 4141) 
whence, since the vectors bi, bs, .. . 'b; are mutually orthogonal, 
a; (bi, bi) + (bi, A144) = 0 


or 


—_ (bi, a41) so 
TT i=1, 2.. l 


Continuing this process, we can construct the desired orthogonal 
system bi, bs, e.e bp. 

- Applying the orthogonalization process to an arbitrary basis 
of the space En, we obtain an orthogonal system of n nonzero vec- 
tors, that is to say, an orthogonal basis, since (as has been proved) 
this system is linearly independent. Now, using the remark made 
in connection with the first step of the process of orthogonalization, 
and also taking into account the fact that any nonzero vector may 
be included in some basis of the space, we can even make the follo- 
wing assertion. 

Every Euclidean space possesses orthogonal bases, and any nonzero 
vector of this space enters into some orthogonal basis. 

In what follows, an important role will be played by a special 
type of orthogonal basis. Basis of this kind correspond to the rectan- 
gular Cartesian systems of coordinates used in analytic geometry. 

We shall call a vector b normalized if its scalar square is equal 
to unity 

(b, b) =1 


If a 0, whence (a, a) œ 0, then the transition to the vector 
1 


Vea” 


b= 
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is termed normalization of the vector a. The vector b is normalized 
since 


1 4 4 2 
; b = | — l, Dn = | —— , = 
6d = lyase yes”) = (pea) @o=! 
A basis E1, Cay oy Cn for the Euclidean space E, is called ortho- 
normal if it is orthogonal and all its vectors are normalized, that is, 


(ei, e;) a 0, i Æj 
(e;, e;) == 1, t = 4, 2, so og (6) 


Every Euclidean space has orthonormal bases. 

To prove this, it will suffice to take any orthogonal basis and 
to normalize all its vectors. The basis will remain orthogonal, since 
for any a and f it follows from (a, b) = 0 that 


(aa, Bb) = af (a, bd) = 0 


A basis e,, e2, ..-, €n of a Euclidean space E, is orthonormal if 
and only if the scalar product of any two vectors of the space is equal 
to the sum of the products of the corresponding coordinates of the vectors 
in the indicated basis; that is, from 


a= > Qili b= ` Be o (7) 
= role 
follows l 
(a, b) = 2 ap: es : (8) 


Indeed, if equations (6) hold for our basis, then 
(a, b)= (a Qiêi, pà Byes) = ; %2, iB; (ei, ej) = 2 qipi 


Conversely, if our basis is such that for any vectors a and b written 
in this basis in the form (7), equation (8) holds true, then, taking 
for a and b any two vectors e; and e; in the basis, which are distinct 
or the same, we can derive (6) from (8). 

Comparing the result just obtained with the earlier given proof 
of the existence of n-dimensional Euclidean spaces for any n, we can 
make the following assertion: if an arbitrary basis is chosen in an n- 
dimensional linear space V,,, then in V, we can specify scalar multi- 

plication so that in the resulting Euclidean space the chosen basis will _ 
be one of the orthonormal bases. ; 

Isomorphism of Euclidean spaces. Euclidean spaces E and E’ 
are termed isomorphic if we can establish between the vectors of 
these spaces a one-to-one correspondence such vat the following 
requirements are met. 
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(4) The correspondence is an isomorphic correspondence be- 
tween E and E’, which are regarded as linear spaces (see Sec. 29). 

(2) In this correspondence the scalar product is. preserved; in 
other words, if for the images of the vectors a and b in Æ we have 
the corresponding vectors a’ and b” in E’, then 


(a, b) = (a’, b’) l (9) 


From Condition (4) it follows immediately that isomorphic Eu- 
clidean spaces have one and the same dimension. We will prove the 
_ converse. 

Any Euclidean spaces E and E’ having the same dimension n are 
isomorphic to each other. i 

In the spaces E and E’, choose the orthonormal bases 


' Gas Cy, >- >, Cn (10) 
and, respectively, | i, aes . 
Bel Oe oleh ee © (41) 


If we associate every vector 
in E -with a vector: 


in E’, having in the basis (11) the same coordinates as the vector a 
in the basis (10), we will obviously get an isomorphic correspondence 
between the linear spaces E and E’. We will show that (9) holds as 
well: if 


b= Di Bien b= D Bie 


then, by (8) [use the fact that the bases (10) and (41) are ortho- 


normal!], 
n 


(a, b) = ps oP; = w, b’) 


It is natural not to consider isomorphic Euclidean spaces as 
distinct, and so for every n there exists a unique n-dimensional Eu- 
clidean space in the same sense that for every n there exists a 
unique n-dimensional real linear space. 

The concepts and results of this section may be extended to 
the case of complex linear spaces in the following manner. A com- 
plex linear space is called a wnitary space if scalar multiplication 
is given and (a, b) is, in general, a complex number. Axioms [I-IV 


14—5760 
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must hold true (note, in the statement of Axiom IV, that the scalar 
square of a nonzero vector is real and is strictly positive), and Axiom 
I is replaced by the axiom 


I o @)=GG 


where, as usual, the bar denotes the complex conjugate. 
Consequently, scalar multiplication will no longer be com- 
mutative. Still, an equation that is symmetric to Axiom II holds true, 


Il’ (a, b + c) = (a, b) + (a, c) 
since 
(a, b + c) = (b + c, a) = (b, a) + (c, a) 
= (b, a) + (c, a) = (a, b) + (a, c) 


On the other hand 7 
IH - (a, ab) = æ (a, b) 


since 
(a, ab) = (ab, a) = a (b, a) = a (b, a) = a (a, b) 


The concepts of orthogonality and of an orthonormal system of 
vectors are carried over to the case of unitary spaces without any 
alterations. As before, proof is given of the existence of orthonormal 
bases in any finite-dimensional unitary space. Here, however, if 
1, êz, - - -» En iS an orthonormal basis and the vectors a, b have the 
notations (7) in this basis, then 

n 
(a, b) = > as 
i=1 

The results of the other sections of this chapter can also be ex- 
tended from Euclidean to unitary spaces, but we will not do this 
and will refer the interested reader to special books on linear algebra. 


35. Orthogonal Matrices, Orthogonal Transformations 


Let there be given a real linear transformation of n unknowns: 


n 
Ti = > Tih, i=, 2, cee (1) 
h=1 


Denote the matrix of the transformation by Q. This transformation 
carries the sum of the squares of the unknowns z4, £a, . . ., Zn, that is 
the quadratic form z? + 23 -+ ... + xh, which is the normal form 
of positive definite quadratic forms (see Sec. 28), into a certain qua- 
dratic form in the unknowns y1, Ye, ---; Yn. Quite accidentally, 
this new quadratic form may itself turn out to be a sum of the 
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squares of the unknowns y1, Yo, < - - Yn; that is, we can have the 
equation 


EE gta, E (2) 


which, after replacing the unknowns 2, £e, ..., £n by their ex- 
pressions (1), becomes an identity. The linear transformation of 
unknowns (1) which has this property (or, as we say, such as leaves 
the sum of the squares of the unknowns invariant) is called an ortho- 
gonal ans OFA TON of the unknowns. Its matrix Q is an orthogonal 
matriz. 

There are many other definitions of an orthogonal transformation 
and an orthogonal matrix which are equivalent to those given above. 
We now give some of them that will be needed in the sequel. 

In Sec. 26 we gave a rule for the transformation of the matrix 
of a quadratic form under a linear transformation of the unknowns. 
Applying it to our case and taking into account that the unit ma- 
trix Æ is the matrix of a quadratic form (being the sum of the squares 
of all the unknowns), we find that equation (2) is equivalent to the 
matrix equation . 


: Q'EQ =E 
that is, | l 
J QQ =E (3) 
Whence 
: Q = Q7 (4) 
and so the following equation holds true too: | 
QV =E (5) 


Thus, by (4), an orthogonal matrix Q may be defined as a matriz 
for which the transpose Q' is equal to the inverse matriz Q-1. Each one 
of the equations (3) and (5) can also be taken as a definition of an 
orthogonal matrix. 

Since the columns of Q’ are the rows of Q, it follows from (5) that 
the square matrix Q is orthogonal if and only if the sum of the squares 
of all elements of any one of its rows is equal to unity, and the sum of 
the products of the corresponding elements of any two distinct rows is 
zero. From (3) follows an analogous assertion for the columns of a 
matrix Q. 

Taking determinants in (3), we get (since |Q |= [1@Q)) 


A 
Whence it follows that the determinant of an orthogonal matriz is 
equal to +41. Thus any orthogonal transformation of unknowns is 
a nonsingular transformation. We cannot, quite naturally, assert the 
converse: also note that by far not every matrix with determi- 
nant +1 is orthogonal. 


14* 
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A matrix that is inverse to an orthogonal matrix will itself be ortho- 
gonal. Indeed, taking transposes in (4), we obtain 


(QY = Oy = Q = (Q7 


On the other hand, a product of orthogonal matrices is “orthogonal. 
Indeed, if matrices Q and R are orthogonal, then, using (4), and 
also (6) of Sec. 26 and an analogous equation which is true for in- 
verses, we get 


(QRY = R'Q' = RQ- = (QR)~ 


In Sec. 37, use will be made of the following assertion. 

-The change- -of-basis matrix from an orthonormal basis of. a Eucli- 
dean space to any other of its orthonormal bases is orthogonal. 

In a space £,, let there be given two orthonormal bases e;, 
Er - - +, En and el, €z, ..., €n with the change-of-basis matrix Q = (q;j), 


e' = Qe 


Since the basis e is orthonormal, the scalar product of any two vectors 
(of any two vectors from the basis e’, for instance), is equal to the 
sum of the products of the corresponding coordinates of these vectors 
in the basis e. However, since basis e’ is also orthonormal, the scalar 
square of each vector of e’ is equal to unity, and the scalar product 
of any two distinct vectors of e’ is equal to zero. Whence, for the 
rows of coordinates of the vectors of basis e’ in basis e (i.e., for the 
rows of matrix Q), follow the assertions which, as derived above 
from (5), are characteristic of an orthogonal matrix. 

Orthogonal transformations of Euclidean space. It will be 
well at this point to make a study of an interesting special type of 
linear transformations of Euclidean spaces, though such transfor- 
mations will not be used in the sequel. 

A linear transformation ọ of a Euclidean space Z, is called an 
orthogonal transformation of that Euclidean space if it preserves the 
scalar square of every vector, that is, for any vector a, 


(ap, aq) = (a, a) (6) 


From this we derive the following more general assertion, which 
quite naturally can also be taken as a definition of an orthogonal 
transformation. 

An orthogonal transformation @ of a Euclidean space preserves 
the scalar product of any two vectors a, b: 


| (ap, be) = (a, b) GO) 
Indeed, by (6), 
((a + b) p, (a + b) p) = (a+b, a +b) 
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However, 
((a + b) p, (a + b) p) = (ag + bp, ap + bo) 
= (ap, ag) + (ap, bg) + (bp, aq) + (bp, bẹ), 
(a + b, a + b) = (a, a) + (a, b) + (b, a) + (b, b) 


Whence, using (6) both for a and for b, and taking into account the 
commutativity of scalar multiplication, we obtain 


2 (ap, bp) = 2 (a, b) 


and so (7) holds true. l 

In an orthogonal transformation of a Euclidean space, the images 
of all vectors of any orthonormal basis themselves form an orthonormal 
basis. Conversely, if a linear transformation of a Euclidean space car- 
ries at least one orthonormal basis again into an orthonormal basis, 
then the transformation is orthogonal. 

Indeed, let ọ be an orthogonal transformation of the space E,, 


and let e;,e,, ..., en be an arbitrary orthonormal basis of this space. 
Due to (7), there follow from the equations 
(ei, e) = 1, i= 1, 2, vee 7, 


(e;, e) = 0 for i Æj 
the equations 
(e;9, e:p) = 1, i = 4, 2, . 9 n 


(e:9, e;9) = 0, i Æj 


That is, the system of vectors e,9, eP, . . -, enp proves to be ortho- 
gonal and normal; for this reason it is an orthonormal basis of the 
space En. . 

Conversely, let a linear transformation ọ of the space E, carry 
the orthonormal basis e}, e,, ..., en again into an orthonormal 
basis; that is, the system of vectors e,9, e9, - - -» enp is an orthonor- 
1 al basis of the space E,. If 


n 
a= > Xie; 
i=1 


is an arbitrary vector of the space £,, then 


n 
ag = 2 ai (€:9) 
= 
The vector aq has the same coordinates in the basis ep as the vector a 
has in the basis e. However, both these bases are orthonormal, and 
for this reason the scalar square of any vector is equal to the sum 
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of the squares of its coordinates in any one of these bases. Thus 


n 
(a, a) = (aq, ap) = 2, ai 
i= 
Equation (6) indeed holds true. 

An orthogonal transformation of a Euclidean space in any ortho- 
normal basis is represented by an orthogonal matrix. Conversely, if 
a linear transformation of a Euclidean space in at least one orthonormal 
basis is represented by an orthogonal matriz, then the transformation 
is orthogonal. 

Indeed, if the transformation ọ is orthogonal, and the basis e, 
e, ..-, €n is orthonormal, then the system of vectors e,9, €.9, . . 

. . +, @n@ will also be an orthonormal basis. The matrix A of the 
transformation ọ in the basis e, 

| ep = Ae (8) 
will thus be the transition matrix from the orthonormal basis e to 
the orthonormal basis eq, i.e. (as proved above), it will be orthogonal. 

Conversely, let a linear transformation ọ be represented in an 
orthonormal basis ei, es, ...,; en by the orthogonal matrix A; 
then (8) holds. Since the basis e is orthonormal, the scalar product 
of any vectors (in particular, any vectors of the system eq, 
Co, -. +, En) is equal to the sum of the products of the correspon- 
ding coordinates of these vectors in the basis e. Therefore, since ma- 
trix A is orthogonal, 

(eip, eip) = 4, i= 14, 2,..., n, 

(e:p, e;p) = 0 for i Æj 
That is to say, the system eq is itself an orthonormal basis for the 
space En. Whence follows the orthogonality of the transformation ọ. 

As the reader will recall from analytic geometry, of all the affine 
transformations of a plane that leave the coordinate origin fixed, 
rotations (combined perhaps with mirror reflections) are the only 
ones that preserve the scalar product of the vectors. Thus, orthogonal 
transformations of n-dimensional Euclidean space may be regarded 
as “rotations” of this space. 

Obviously, one of the orthogonal transformations of Euclidean 
space is the identity transformation. On the other hand, the rela- 
tionship. we have established between orthogonal transformations 
and orthogonal matrices, and also the relationship (presented in 
Sec. 31) between operations on linear transformations and on matrices, 
permit deriving, from familiar. properties of orthogonal matrices, 
the following properties of orthogonal transformations of Euclidean 
space, which can be verified directly. 

Every orthogonal transformation is nonsingular and its inverse 
is also orthogonal. 

The product of any orthogonal transformations is orthogonal. © 
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36. Symmetric Transformations 


A linear transformation ọ of n-dimensional Euclidean space is 
called symmetric (or self-adjoint) if for any vectors a, b of this space 


we have the equality 
(ap, b) = (a, bg) | (4) 


That is, in scalar multiplication the symbol of symmetric trans- 
formation may be carried from one factor to the other. 

Obvious instances of symmetric transformations are the iden- 
tity transformation g and the zero transformation œ. A more gene- 
ral example is the linear transformation in which each vector is 
multiplied by a fixed scalar a, 


ap = aa 
Indeed, in this case 
(ag, b) = (aa, b) = a (a, b) = (a, ab) = (a, bg) 


The role of symmetric transformations is extremely great and 
calls for a detailed study. 

A symmetric transformation of a Euclidean space in any orthonor- 
mal basis is represented by a symmetric matrix. Conversely, if a linear 
transformation of a Euclidean space is represented in at least one ortho- 
normal basis by a symmetric matrix, then the transformation is sym- 
metric. 

Indeed, let the symmetric transformation @ be represented in 
an orthonormal basis e,, €s, ..., €n by the matrix A =(a,,). Ta- 
king into account that in an orthonormal basis the scalar product 
of two vectors is equal to the sum of the products of the correspon- 
ding coordinates of these vectors, we obtain 


n 
(exp, ej) = (>) diker, ej) = aij 
kzi 


n 
(e:, €79) = (e:, 2i Ajker) = Aji 
That is, due to (4), 
Qij = Qji 


for all i and j. The matrix A is thus symmetric. 
Conversely, let a linear transformation ọ be represented in the 
orthonormal basis e;, e,, . . ., en by the symmetric matrix A = (a;,), 
Qij = @ji for all i and j (2) 
If - . 


nr n 
b= > Piei, c= È Vies 
i=i j==1 
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are any vectors of the space, then. 
oa n i n n 
bọ = > P: (ei) = > (à Biais) ej 
i= j=1 i= 
n n nr 
cp = È vy (es) = Dd) (È Yaj) ei 
j=1 i=1 j=1 


Using the fact that the e-basis is orthonormal, we get. 


n 


(bp, c)== >) Biti, 


2, = 


(b, cp) = $ Piva 


i, j=l 
By (2), the right sides of the latter equalities coincide: and therefore 
(bọ, ¢) Ta (b, cg) 


which completes the proof. 

The result obtained yields the following property of symmetric 
transformations that can readily be verified directly. 

The sum of symmetric transformations and also the product of a sym- 
metric transformation by a scalar are again symmetric transformations. 

We now prove the following important theorem. 

All characteristic roots of a symmetric transformation are real. 

Since the characteristic roots of any linear transformation co- 
incide with the characteristic roots of the matrix of this transforma- 
tion in any basis, and a symmetric transformation is represented 
in orthonormal bases by symmetric matrices, it suffices to prove 
the following assertion. 

All the characteristic roots of a symmetric matriz are real. 

Let Ao be a characteristic root (posibiy complex) of the Syin 
metric matrix A = (a;;), 


Then the system of homogeneous linear equations with complex 
coefficients 


a : a 
D aijt = hort, i=1, ees n 
J= 


has a zero determinant, which is to say, it has a nontrivial solution 
Bi, Bas - - -+: Bn (generally complex). Thus, 


n 


2 E E E E 2, ..., 7 (3) 


j=i 
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Multiplying both sides of each ith equation of (3) by a scalar B,, 
the conjugate of B;, and adding separately the left and right members 
of all- the resulting equations, we get the equation 


-D aiBsPi= do D Bibi (4) 
i, j=1 i=1 


The coefficient of A> in (4) is a nonzero real number since it is 
the sum of nonnegative real numbers, of which at least one is strictly 
positive. The real nature of the number Ay will therefore be proved 
if we prove the real nature of the left-hand side of (4); to do this, 
it suffices to show that this complex number coincides with its con- 
jugate. Here, for the first time, we make use of the symmetric nature 
of the (real) matrix A. 

n 2y- n T n s 
D OB Bi = 2 aipa = ae ai sP iB: 


j= , j= 


t, 


n n 

= à 718 1 = i 3} @iBPiPy= >) cash Be 
i, j=1 i, j=1 i, j= 1 | 

Note that the second last equality is obtained by a simple interchange 

in the summation indices: j is put in place of i, i in place of j. Hence, 

the theorem is proved. 

A linear transformation o of the Euclidean space En is symmetric 
if and only if there exists in E, an orthonormal basis composed of the 
eigenvectors of the transformation. 

In one direction, this assertion is almost obvious: if there exists 
in E, an orthonormal basis e1, es, . - -, ĉn, and 


eip = Ailis be 4,2, <<ign 


then in the e-basis the transformation @ is represented by the diagonal 
matrix 
Ai 0 
Az 


ie gh a SNO aAa 


A diagonal matrix, however, is symmetric, and so the transforma- 
tion @ is represented in the orthonormal basis e by a symmetric ma- 
trix, hence it is symmetric. 

-The basic inverse assertion of the theorem we prove by induction 
with respect to the dimension n of the space En. Indeed, for n = 1, 
‘any linear transformation ọ of £, invariably carries any vector into 
a proportional vector, whence it follows that any nonzero vector a 
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is an eigenvector for ọ (incidentally, it also follows that any linear 
transformation of the space Æ, is symmetric). Normalizing the vec- 
tor a, we obtain the desired orthonormal basis of the space Æ.. 

Let the assertion of the theorem be proved for an (n — 1)-di- 
mensional Euclidean space and let a symmetric transformation 
be given in the space £,. From the above-proved theorem follows 
the existence, under q, of a real characteristic root A». Consequently, 
this number is an eigenvalue of the transformation ọ. If a is an 
eigenvector of the transformation ọ corresponding to this eigenvalue, 
then any nonzero vector proportional to the vector a will (under ọ) 
be an eigenvector corresponding to the same eigenvalue Ag, since 


(aa) p = a (ag) = aœ (Aga) = ào (œa) 


In particular, normalizing the vector a, we obtain a vector e, such 
that 


eo = ~ i 
(e1, e1) T 


As was proved in Sec. 34, the nonzero vector e, may be included 
in the orthogonal basis 


C4, êz» ea eg en a (5) 
of the space En. Those vectors whose first coordinate in the basis (5) 
is zero, that is, vectors of the form ae, +....+%,en obviously 


constitute an (n — 1)-dimensional linear. subspace of the space Ep, 
which we will designate by L. It will even be an (n — 1)-dimensio- 
nal Euclidean space, since a scalar product, being defined for all 
vectors in Z,, is in particular defined for vectors in L and possesses 
all the requisite properties. 

The subspace L consists of all the vectors of E, which are ortho- 
gonal to the vector e,. Indeed, if 


a = me, tage +... + anen 


then, by the orthogonality of the basis (5) and the normalized charac- 
ter of the vector e4, 


(e1, a) = os (e1, 61) + a; (er €) +. ~~. + an (Cty en) = a 


‘that is to say, (e,, a) = 0 if and only if a, = 0. 

If the vector a belongs to the subspace L,i.e., (e,, a) = 0, then 
the vector ag too lies in L. Indeed, because of the symmetry of the 
transformation gq, 


(e1, ap) = (e1p, a) = (Ages, a) = Ao (e1, a) = Ag: 0 = O 


That is, the vector ag is orthogonal to e} and therefore lies in L. 
This property of the subspace L, which is called its invariance under 
the transformation p, enables us to consider ọ (regarded solely with 
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respect to the vectors in L) as a linear transformation of this (n — 1)- 
dimensional Euclidean space. It will even be a symmetric transfor- 
mation of the space L, since equation (1), which holds for any vectors 
in E,, will hold (as a particular case) for vectors lying in L. 

By virtue of the induction hypothesis, space Z has an orthonormal 
basis consisting of the eigenvectors of the transformation q; denote 
it by es, ..., €n. All these vectors are orthogonal to the vector e4, 
and so €41, €o, . - -» En is the desired orthonormal basis of the space En 
consisting of the eigenvectors of the transformation ®. The theorem 
is proved. ; 


37. Reducing a Quadratic Form to Principal PRES: 
Pairs of Forms 


Let us apply the last theorem of the preceding section to prove 
the following matrix theorem. 

For every symmetric matrix A it is possible to find an orthogonal 
matrix Q which diagonalizes matrix A, that is, the matrix Q-!AQ 
obtained by transforming matrix A by matrix Q will be diagonal. 

Let there be given a symmetric matrix A of order n. If e}, 
C5, . - -, €n iS Some orthonormal basis of an n-dimensional Eucli- 
dean space E,, then matrix A represents in this basis a symmetric 
transformation p. As has been proved, there is in Z, an orthonormal 
basis fi, fo, . - - fn made up of the eigenvectors of the transforma- 
tion ọ. In this basis, ọ is represented by the diagonal matrix B 
(see Sec. 33). Then, by Sec. 31, 


B = QAQ (1) 
where Q is the change-of-basis matrix from the f-basis to the e-basis, 
e = Of (2) 


This matrix, as a matrix for changing from one orthonormal basis 
to another similar basis, will be orthogonal (see Sec. 35). The theorem 
is proved. 

Since the inverse of orthogonal matrix Q is equal to its trans- 
pose, Q7? = Q’, equation (1) may be rewritten as 

B = Q'AQ 

From Sec. 26, however, we. know that such precisely is the trans- 
formation of the symmetric matrix A of a quadratic form subject 
to.a linear transformation of the unknowns with the matrix Q. 
However, taking into account that a linear transformation of un- 
knowns with an orthogonal matrix is an orthogonal transformation 
(see Sec. 35) and that a quadratic form reduced to canonical form 
has a diagonal matrix, we arrive, on the basis of the preceding theo- 
rem, at the following theorem on the reduction of a real quadratic 
form to principal axes. 
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Every real quadratic form f (21, Za, - . ., Zn) can be reduced to 
canonical form by an orthogonal transformation of the unknowns. 

Although there may be many different orthogonal transforma- 
tions of the unknowns which reduce the given quadratic form to 
canonical form, the canonical form itself is actually determined 


uniquely. 
No matter what the orthogonal transformation that reduces to ca- 
nonical form the quadratic form f (£1, £2 .. ., zn) with matrix A, the 


coef ficients of this canonical form are the characteristic roots of the 
matrix A (counting multiplicities). 

Suppose an ia a transformation reduces form f to the ca- 
nonical form . 


| Cree emer in p na? ae op Unyi 


This orthogonal transformation preserves the sum of the square 
of the unknowns and so, if A is a new unknown, 


n m n 
a ee aD tt = à uiy? — A 2 y? 
= qas t= 


Taking determinants of these quadratic forms and taking into ac- 
count that after completing the linear transformation the determi- 
nant of the quadratic form is multiplied by the square of the deter- 
minant of the transformation (see Sec. 28), and the square of the 
determinant of an orthogonal transformation is equal to unity (see 
Sec. 35), we get the equation 


W—A QO... 0 


from which follows the assertion of the theorem. 

This result may be stated in matrix form as well. 

No matter what the orthogonal matrix which diagonalizes the sym- 
metric matrix A, the principal diagonal of thé resulting diagonal ma- 
triz will exhibit the characteristic roots of the matrix A taken with 
their multiplicities. 

Finding the orthogonal transformation that reduces a quadratic 
form to principal axes. In certain problems it is not only necessary 
to know the canonical form to which a real quadratic form is re- 
duced by an orthogonal] transformation, but also the orthogonal 
transformation itself which accomplishes the reduction. It would be 
rather difficult to find this transformation by using the principal- 
axis theorem so we shall point out a different way. Namely, all we 
need to know is how to find the orthogonal matrix Q which diagona- 
lizes the given symmetric matrix A, or, what is the same thing, to 
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find its inverse matrix Q-1. By (2), this is the change-of-basis matrix 
from the e-basis to the f-basis; that is, its rows are coordinate rows 
(in the e-basis) of an orthonormal system of n eigenvectors of the 
symmetric transformation p defined by the matrix A in the e-basis. 
It remains to find such a system of eigenvectors. 

Let A, be any characteristic root of the matrix A and let its 
multiplicity be equal to ky. From Sec. 33 we know that the collection 
of coordinate rows of all eigenvectors of the transformation @ cor- 
responding to the eigenvalue A, coincides with the set of nonzero 
solutions of the system of homogeneous linear equations 


(A — ME) X = 0 (3) 


Here, the symmetric nature of the matrix A enables us to write A 
in place of A’. From the above-proved theorems on the existence of 
an orthogonal matrix that diagonalizes the symmetric matrix A, 
and on the uniqueness of this diagonal form, it follows that for sy- 
stem (3) it is at least possible to find ky linearly independent solu- 
tions. We seek such a system of solutions by the methods taken from 
Sec. 12, and then we orthogonalize and normalize the resulting sy- 
stem in accord with Sec. 34. 

Taking in turn, for Ao, all the different characteristic roots of — 
the symmetric matrix A and noting that the sum of the multipli- 
cities of these roots is equal to n, we obtain a set of n eigenvectors 
of the transformation ọ represented by their coordinates in the e- 
basis. To prove that this is the desired orthonormal system of eigen- 
vectors, it remains to prove the following Jemma. 

The eigenvectors of the symmetric transformation œ which corres- 
pond to distinct eigenvalues are mutually orthogonal. 

Suppose that 


bp = 2b, cp = AC 
and M s£ ħa Since l 
(bp, c) = (Mb, c) = à (b, c), 
(b, cg) = (b, Aae) = M (b, 6) 
it follows from 
(bp, c) = (b, cg) 
that 
Ay (b, c) = Ag (b, c) 
or, because Ay =Æ Ao, | 
(b,c) = 0 


which is what we set out to prove. 
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Example. Reduce to principal: axes the quadratic form 
f (£1, Z2, Z3, £4) = Qxyeo F Qayes — Qxyr, — Qroxg +- Arex, + 2xgx, 


The matrix A of this form looks like : 


0O 4 41-41 

4 QO-14 1 

BN Fo 2g 3H, 

—i1 4 4 0 
Let us find its characteristic polynomial: 
—-hK 4 14-14 
4-A—41 14 

JA—AEI=| ia gfe PF O+9) 

—41 4 1i- 


Thus, the matrix A has a triple characteristic root 1 and'a simple characteristic 
root —3. Hence, we can already write the canonical form to which the form f 
is reduced by an orthogonal transformation: 


f= y? + va + y3 — 3yi 
Let us find the orthogonal transformation that accomplishes this reduction. 
The system of homogeneous linear equations (3) becomes, for Ay = 4, 
—z, + zz + z3 — t, = 0, 
Ti — z2 — T3 + t, = 0, 
Ly — 22 — T3 + z, = 0, 
<a, + 2g + t3 — mq, = 0 


The rank of this system is unity and so we can find three linearly independent 
solutions for it. For example, the vectors 


b= 1,1,0,0), 
be = ( 1, 0, 1, 0), 
ba = (—1, 0, 0, 4) 


will be such solutions. : 
Orthogonalizing this system of vectors, we obtain the following system of 
vectors: 


c= b= (1, 1, 0, 0), 


4 4 4 
a=- at= (y sany 1, o), 
1 


i 1 1 1 
a= a aty cths=(— 3, sz.) 1) 


, a the other hand, the system of homogeneous linear equations (3) becomes, 
or Ào = —3, 
32, + rat z3 — n = 0, 
zı + 322—~— zra + z, = 0, 
zı — z2 + 3r3 + z, = 0, 
—z; + t2 + z3 + 34, = 0 - 
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This system has rank 3. Its are solution is the vector 
a= (i, , —1, 4) 
The system of vectors c1, c2, C3, C is eee Normalizing it, we arrive 
at the orthonormal system of vectors 


a= (—. EN 0, 0) 5 


V2’ V2 
4 4 y oy 
ERER a) 
; (a 6 3 
; 4 4 1 V3 
a= ( eee ee +), 
2Y3 2Y3 273 
jis (+ ee ee 
4 g?! 3’ 2 , x) 
Thus, the form f is reduced to a axes by the orthogonal transformation 
u= Ut 
Vi v 


n=% a= ay T3 
nm aty = tat oa 75 i ot Vay 


4 4 4 
LT ale ed ee ad a ats T4 


It is well to note that the choice of a system of linearly independent eigen- 
vectors carre ponding to a multiple eigenvalue is extremely ambiguous, and 
so there are many different orthogonal transformations which reduce the form f 
to canonical form. We found only one of them. 


Pairs of forms. Let there be a pair of real quadratic forms in n 
unknowns, f (Zi, Zas >.. Zn) and g (£i, Zy, ..'., Zn) Does there 
oe a nonsingular linear. transformation of the unknowns 2, 

, Zn such that will simultanegusly reduce both forms to ca- 
namical ‘form? 

In the general case, the answer is no. Let us examine the pair 
of forms 


Í (zi Ta) = xy g (x4, Zo) = TT? 
Let there be a nonsingular linear transformation 
Ly = CY F Ciya } (4) 
Le = CosYs F Coase 


which reduces both forms to canonical form. For f to be reduced by 
transformation (4) to canonical form, one of the coefficients c,;, 
Cy. must be zero, otherwise the term 2cy;c,,y;y, Would occur. Renum- 
bering, if necessary, the unknowns yi, Ye, we can set cy, = 0 and 
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SO Cy, = 0. However, we now find that 


g (21, 2a) = Crys (CY H Cone) = CasCosYt F Casco 


Since the form g was also to become canonical, it follows that 
C4404, = 0, that is, c,, = 0, which, together with c,, = 0, cortradicts 
the nonsingularity of the linear transformation (A). 

The situation is different if we assume that at least one of our 
forms, say g (%1, 2g, .-., Zn) is positive definite.” Namely, the 
following theorem holds. 

If f and g form a pair of real guadratie forms in n unknowns, and the 
second one is positive definite, then there exists a nonsingular linear 
transformation which simultaneously ‘reduces g to normal form and f 
to canonical form. 

For proof, first perform the nonsingular linear transformation 
of the unknowns 2%, Zz, ...; Zn, . ie 


X = TY 
which reduces the positive definite form g to normal form, 
g (Tis Tos oer In) =Y tyt.. + Yn 
Then f will go into some form ọ in new unknowns, 
Í (Zis Zos + - +r In) = P (Yis Yor - - +> Yn) 
Now perform an orthogonal transformation of the unknowns 


Yis Yor s- Yn» 
Y=QZ 


which reduces p to principal axes, 
P (Yo Yas ++ oy Yn) = Mat + Agee +... + Anza 


This transformation (see definition in Sec. 35) carries the sum of the 
squares of the unknowns y;, Yo, ..-, Yn into the sum of the squares 
of the unknowns Zi, Z} ..., Zn. As a result we get 


Í (tis Ian + +) In) = Aaz? + Agz? +... + Anz}, 
E(t Za -o MH) =t t.‘ 4B l 
That is, the linear transformation 
= (TQ) Z 


is the required one. 


* This condition is not, of course necessary; thus, both the forms zł + 
+ 22 — 232 and zł — 22 — z? now have canonical form, though none is posi- 
3 3 i a 3 
tive definite. . 


CHAPTER 9 


EVALUATING ROOTS 
OF POLYNOMIALS 


38. Equations of Second, Third, and Fourth Degree À 


The fundamental theorem proved in Sec. 23 establishes the exi- 
stence of n complex roots for any polynomial of degree n with nume- 
rical coefficients. The proofs (e any methods for finding these roots. They 
are thus pure “existence proofs”. The search for such methods began 
naturally in attempts to derive formulas similar to the one used in 
the solution of quadratic equations for the case of real coefficients so 
familiar from school algebra. We will now show that this formula 
holds true for quadratic equations with complex coefficients as well, 
and that analogous formulas (though much more involved) can be 
derived for equations of the third and fourth degree. 

. Quadratic equations. Suppose we have a quadratic equation 


; z+ pre+gq=0 
with arbitrary complex coefficients, the leading coefficient may, 


without loss of generality, be considered equal to unity. This equation 
may be written as 


(=4+)"+ (0%) 0 
As we know, it is possible to take the square root of the complex 
number p — q without going outside the complex-number system. 
The two values of this root which differ in sign alone can be written 


as + yg — q. Therefore, 


= 
eo the YE oe 


That is, the roots of the given equation may be found via the usual 
formula 
a 2 
t= $4 Fa 


15—5760 \ 
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Example. Solve 
z? — 32+ (3 — i) = 0 


Using the formula derived above, we get 
3 9 ` 3 1 
eta 2-G-padat —3+ 4i 


Applying the methods of Sec. 19, we find 


V—3 + át = +0 + 20) 


q=2+4i, w=1—it 


Cubic equations. Unlike the case of quadratic equations, we have 
not had a procedure for solving cubic equations even in the case of 
real coefficients. We will now derive a formula for cubic. equations 
similar to the formula used for quadratic equations, and we will 
assume from the start that the coefficients can be any complex num- 
bers. 

Suppose we have the cubic equation 


y+ ay? + by +e=0 | (1) 


with arbitrary complex coefficients. Replacing in (1) the unknown 
y by a new unknown z related to y by the equation 


and therefore 


y=s—5 | 2 


we get an equation in the unknown z, which, as can readily be veri- 
fied, does not contain the square of the unknown; that is, we have 
an equation of the form ; 
z? +t pr+q=0 (3) 
If the roots of (3) are found, then, by (2), we will get the roots of the 
given equation (1) as well. Our job, therefore, is to learn to solve 
the “incomplete” cubic equation (3) with arbitrary complex coef- 


ficients. 
By the fundamental theorem, equation (3) has three complex 
roots. Let x) be one of them. We introduce an auxiliary unknown u 


and consider the polynomial 
f (u) = u? — xu -$ 
Its coefficients are complex numbers and therefore it has two complex 
roots a@ and f; by Vieta’s formulas, l 
(5) 


colts 
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Substituting expression (4) of the root zo into (3), we get 


(a+ p) +p(@+ Bb) +q=0 


a? + p? + (3af +p) (+B) +9 =0 
However, from (5) it follows that 3aB + p = 0, and so we have 


or 


a? + p? = g (6) 
On the other hand, from (5) it follows that 
ap? = — 2 (7) 


Equations (6) and (7) show that the numbers a* and B® are roots 
of the quadratic equation 


2+ q—F =0 (8) 


with complex coefficients. 
Solving (8), we get 


eer CT as 
7 ce ae 
whence* 


: q V qe, P 3 q 2 p 
a= j/ -44 TE. y 5 Li (9) 


We arrive at the following formula (Cardan's formula) which 
expresses the roots of equation (3) in terms of its coefficients by means 
of radicals of index 2 and index 3: 

4 


3 fF g S OO 

ie a? = gg, P q 2 

n=a+p=]/ -44V Fte \/ -4- ath 

Since a cube root has three values in the field of complex num- 
bers, formulas (9) yield three values for æ and three for P. However, 
when using Cardan’s formula, one cannot combine just any value 
of the root œ with any value of the root 6; for a given value of a 
we have to take only that one of the three values of B which satis- 
fies condition (5). 

Let a, be any one of the three values of the root a. Then the two 
others may be obtained, as was proved in Sec. 19, by multiplying a, 
by the cube roots s and e? of unity: 


Qs = Q48, Q3 = aye? 


Denote by f; that one of the three values of the root B which corres- 
ponds to the value a, of the root œ on the basis of (5), that is, a1B, = 





* It is immaterial which of the roots of (8) we take for a3 and which one 
for B® since œ and fP enter.in symmetrical fashion into (6) and (7) and also into 
the expression (4) for zo. © 


15* 
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= —Ł. The two other values of ĝ are 
Be = Bie, Bs = Bie? 
Since, by e? = 1, 
aP = a£ BE? = aB = o,8, = -4 
it follows that the value a, of root œ is associated with the value 


Ba of root B; similarly, to the value a3 there corresponds the value f,. 
Thus, all three roots of equation (3) can be written as follows: 


tı = Ay + Bi, | 
Ta = 4 + Bs = aye + Bye’, j (10) 
Zz = Gs + Ba = aye? + fre 


Cubic equations with real coefficients. Let us see what can be 
said about the roots of the reduced cubic equation 


z+ prtq=0 .. (41) 

if its coefficients are real. It turns out that in this case the main role 
2 3 : 

is played by the sign of the expression T + a , which in Cardan’s 


formula is under the square-root sign. Notice that the sign of this 
expression is the opposite of the sign of the expression 


D = —4p3—27q? = —108 (= E) 


which is called the diseriminant of equation (11) (see Sec. 54, below). 
The sign of the discriminant will be used in subsequent statements. 

(1) Let D< 0. In this case, there is a positive number under 
each of the square-root signs in Cardan’s formula, and so each of the. 
cube roots involves real numbers. However, a cube root of a real 
number has one real and two conjugate complex values. Let a, be 
the real value of the root a; then the value f, of the root B, corres- 
ponding to a, on the basis of formula (5), will also be real because 
the number p is real. Thus, the root z; = a, + f, of equation (41) 
is real. We find the other two roots by replacing, in formulas (40) 
of this section, the roots of unity € = s, and g? = e, by their ex- 
pressions (7), Sec. 419: 


n =aet h= a (4i VB) +8, (-4-1 3) 
ES si +i V3 uh , 


y= ae? +8e=0, (4i X) +p, (4X) 


om oe —i V3 oh 
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Since the numbers a, and f, are real, these two roots turn out to be 
conjugate complex numbers, the coefficient of the imaginary part 
being different from zero; since a, + B,, these numbers are the values 
of distinct cube roots. 

Thus, if D < 0, then equation (44) has one real and two conjugate 


complex roots. 
on V -4 b= -4 


(2) Let D=0. Then 
Let a, be the real value of the root a; then f, will also, by (5), be 
a real number, and a, = ĝ,. Replacing, in formulas (10), By by a 


and using the obvious equality e + e? = —1, we get 
Ly = 204, Ty = Q (E + 8°) = —ay, z3 = a, (e? + e) = —ay 
Thus, if D = 0, then all roots of (44) are real and two of them are 
equal. 


(3) Finally, let D œ> 0. Then in Cardan’s formula there is a ne- 
gative real number under the square root sign. Therefore, under the 
signs of the cube roots we have conjugate complex numbers. Thus, 
all the values of the roots œ and § will now be complex numbers. 
However, there must be at least one real root among the roots of 
equation (11). Let this root be 


zı = A + Bo 
Since oe the sum of the numbers a» and Bo and their product, equal 
to — È , are real, it follows that the numbers a and Bo are conjugate 


3 
as roots of a quadratic equation with real coefficients. But then the 
numbers aye and Bye? and likewise the numbers aos” and Boe are 
also conjugate, whence it follows that the roots of equation (11) 


T, = AE + Poe, T3 = age? + Boe 


are real numbers too. 

We thus see that the three roots of (11) are real, and it is easy 
to show that they are all distinct, for otherwise the choice of a 
root zı might be accomplished so that we would get the equality z, = 


= 23, whence 
&q (e — e?) = Bo (e — e°) 


OF @ = Bo, which is clearly impossible. 

Thus, if D >-0, then equation (11) has three distinct real roots. 
- The last case that we have just considered shows that Cardan’s 
formula is of slight practical value. Indeed, although for D > 0 
all roots of (11) with real coefficients are real numbers, to find them 
using Cardan’s formula requires extracting the cube roots of com- 
plex numbers, which is only possible if the numbers are represented 
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in trigonometric form. That is why there is no practical value in 
writing the roots as radicals. Using methods that go beyond the 
scope of this book, we could demonstrate that in the case at hand 
the roots of equation (11) cannot, in general, be expressed in terms 
of coefficients by means of radicals with real radicands. This case 
of the solution of (44) is called the irreducible case (not to be confu 
sed with the irreducibility of polynomials). 


Example 1. Solve the equation 
y3 43y? — 3y —14=0 
The substitution y = z — 1 reduces this equation to 


2 — 624 —9=0 (42) 
Here, p= —6, q= —9, and so 
q? p? 49 
taag?’ 


That is, equation (12) has one real and two conjugate complex roots. By (9), 


SE N ETRADE eee 
a= Vee y 3. p= 3 tenes y + For this reason, u= 2, 


Bi=41, i.e., zı = 3. The other two roots can be found by using formulas 


3 is 
(0): = — Ft YS 3 1 VE. 
This implies that the roots of the given equation are the numbers 
5. V3 5. V3 
y1=2, eS ti w paeta 


Example 2. Solve 
z? — 412z + 16 = 0 


Here, p= — i2, q= 16, and so 


2 PP 
zta? 
Whence a=} — 8, or a= —2. And therefore 
m= —4, Ty =z; =2 


Example 3. Solve 
a3 —19r4+30=0 
Here, p= — 19, q= 30, and so 
g hoe a n TA 
ara a 
Thus, Cardan’s formula cannot be applied to this equation if we remain in the 
domain of real numbers, although the roots are the real numbers 2, 3, —5. 
Quartic equations. The solution of the quartic equation 
yt + ay? + by? + cy +d=0 (13) 
with arbitrary complex coefficients reduces to a solution of some 


auxiliary cubic equation. This is achieved by a procedure due to Fer- 
rari, 
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First, the substitution y = a reduces equation (13) to the 


form 
t+ pr +qr+r=0 (14) 


The left member of this equation is then identically transformed with 
the aid of the auxiliary parameter a: 


xf + px? gr+r= a saa Naa cr oe 
or ma 
(+E +a) — E E seers =) ]= 0 (45) 


Now choose œ so as to complete the square in the square brackets. 
This requires that it have one double root; in other words, we must 
have the equation 


— 4.20 (22+ pa—r +7) =0 (46) 


Equation (46) is a cubic equation in the unknown œ with complex 
coefficients. As we know, this equation has three complex roots. 
Let a be one of them; it is expressed, by Cardan’s formula, with 
the aid of radicals in terms of the coefficients of equation (46), that 
is, in terms of the coefficients of equation (14). 

Given this choice of value for a, the polynomial in the square 


brackets in (15) has the. double root ix , and so equation (15) takes 
the form 


(2+2 ao)  — ery (2-4) =0 


Hence it decomposes into two quadratic equations: 


z2— V Tns (£ tent sie) = I 


mee (17) 
2 £ — a, 
e+ V 2age+ ( y + WE Viz) = 

Since we passed from (14) to (17) by means of identity transfor- 
mations, the roots of (47) will serve as roots for equation (14) as 
well. At the same time, it is easy to see that the roots of (14) are 
expressed in terms of coefficients by means of radicals. We will 
not write out the appropriate formulas because they are exceedingly 
unwieldy and of no practical use. Neither will we investigate sepa- 
rately the case when (14) has real coefficients. 

Remarks on higher-degree equations. Whereas the ancient Greeks 
knew the methods for solving quadratic equations, the above-des- 
cribed methods for solving cubic and quartic equations were disco- 
vered only in the 16th century. This was’ followed by almost three 
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centuries of unsuccessful attempts to find formulas expressing by- 
radicals the roots of any quintic equation (an equation of the fifth 
degree with literal coefficients) in terms of its coefficients. These 
attempts ceased only after Abel demonstrated, in the 1820's, that 
no such formulas can be found for nth-degree equations where 
n >5. : i 

This result of Abel's however did not preclude the possibility 
that the roots of a concrete polynomial with numerical coefficients 
could, in some way, be expressed in terms of the coefficients by 
some combination of radicals, or, as we usually say, that any equa- 
tion is solvable by radicals. In the 1830’s, Galois made a complete in- 
vestigation of the conditions under which a given equation is sol- 
vable by radicals. It turned out that for any n equal to or greater 
than 5 there are nth-degree equations even with integral coefficients 
that are not solvable by radicals. Such, for instance, is the equation 


Š — 4r —2=0 


The investigations of Galois exerted a decisive influence on the 
subsequent development of algebra, but they lie outside the scope 
of this text. : 


39. Bounds of Roots 


We know that there is no method by which we can find the exact 
values of the roots of polynomials with numerical coefficients. Ne- 
vertheless, a vast range of problems in mechanics, physics and engi- 
neering at large reduce to the problem of the roots of polynomials, 
which at times are of very high degree. This circumstance spurred 
numerous investigations to find ways of describing the roots of 
a polynomial with numerical coefficients without actually know- 
ing the roots. For example, studies have been made of the location 
of roots in the complex plane (the conditions under which all roots 
lie within the unit circle, that is, are less than unity in absolute 
value, or the conditions prescribing all roots to lie in the left half- 
plane, that is, to have negative real parts, etc.). For polynomials 
with real coefficients, methods have been elaborated for determi- 
ning the number of their real roots, for finding the bounds within 
which these roots: may be located, etc. Finally, much research has 
been done into methods of approximation of roots: in engineering 
situations, it is ordinarily enough to know only certain approxi- 
mate values of the roots to within a specified accuracy, and if, say, 
the roots of a polynomial were even written as radicals, the latter 
would in any case be replaced by their approximations. 

At one time, such studies constituted the basic content of higher 
algebra. We include here only a very small portion of the pertinent 
results, and taking into account the primary demands of applica- 
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tions we confine ourselves to the case of polynomials with real coef- 
ficients and real roots. In only a few instances will we go farther 
afield. We will consider the polynomial f (x) with real coefficients 
as a (continuous) real function of a real variable z and wherever 
advisable we will take advantage 
of the results and methods of ma- 
thematical analysis. 

A good way to begin the study 
of the real roots of a polynomial 
f (x) with real coefficients is to exa- 
mine the graph of the polynomial: 
obviously, oniy the abscissas of the 
points of intersection of the graph and 
the x-axis are the real roots of the 
polynomial. 

_ To take an example, let us con- 
sider the fifth-degree polynomial 


h (x) = 2° +- 224 — 52° + 82? — Tr— 3 


On the basis of the results of Sec. 
24, we can assert the following 
concerning the roots of this polyno- 
mial: since its degree is odd, k (z) 
has at least one real root; but if 
the number of real roots is greater 
than unity, then it is equal to three 
or five, since complex roots are 
pairwise conjugate. p 
An examination of the graph 
of the polynomial k (x) enables us , 
to say a good deal more about the Fig. 9 
roots. We construct the graph 
(Fig. 9; note that the scale on the z-axis is ten times that on the 
y-axis), taking only integral values of z and computing the corres- 
ponding values of h (z), say by the Horner method: 





x h (x) 
—4 | — 39 
—3 | 144 
—2 83 
—4 18 

0}; —3 
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We see that the polynomial h (x) has in any case three real roots— 
the positive root a, and two negative roots a, and az, 


1<a,<2, -1<a,<0, 
—4 < Q < —3 


Ordinarily, the information on the (real) roots of a polynomial 
that we get by examining the graph is very satisfactory in a practical 
sense. However, the doubt always remains as to whether we have 
indeed found all the roots. For instance, in the case at hand we did 
not show that to the right of z = 2 and to the left of z = — 4 there 
are no roots of the polynomial. What is more, since we only took 
integral values of z, we can assume that the graph we constructed 
does not very accurately reflect the true behaviour of the function 
h (z); it may not, say, take into account the smaller fluctuations 
and so loses some roots. , 

True, we could have taken values down to 0.4 or 0.04, in addi- 
tion to the integral values of z But then the computations would 
have been severely complicated and doubts would still remain. 
On the other hand, we could apply mathematical analysis to test 
the function k (z) for maxima and minima and thus compare our 
graph with the true behaviour of the function; but this brings us to 
the problem of the roots of the derivative h’ (x), which is the same 
kind of problem we are dealing with right now. 

The need is evident for more sophisticated procedures enabling 
us to find the bounds within which lie the real roots of a polynomial 
with real coefficients and to determine the number of the roots. We 
shall examine the problem of the bounds of real roots and leave 
the question of the number of roots to later sections. 

The proof of the lemma on the modulus of the highest-degree 
term (see Sec. 23) already provides a certain bound for the absolute 
values of the roots of a polynomial. Indeed, setting k = 4 in inequa- 
lity (3), Sec. 23, we find that for 


jej>t+ 5 ea 


where ay is the leading coefficient and A is the maximum of the 
absolute values of the remaining coefficients, the absolute value 
of the highest-degree term of the polynomial is greater than the 
absolute value of the sum of all the other terms, and so no value of 
x which satisfies inequality (1) can be a root of this polynomial. 
Thus, for a polynomial f (x) with arbitrary numerical coef ficients, 


the number 1 + eal serves as an upper bound of the moduli (absolute 


values) of all its roots, real and complex. For the case above of the 
polynomial h (z), this bound, since ag = 1, A = 8, is the number 9. 
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However, this bound is usually too high, particularly if we -are 
only interested in the bounds of the real roots. We now give certain 
more precise methods. It is well to bear in mind that if the bounds 
- are indicated within which the real roots of a polynomial are to be 
found, this does not in the least mean that such roots actually exist. 

Let us first demonstrate that it is sufficient to be able to find only 
the upper bound of the real roots of any polynomial. Let there be given 
a polynomial f (x) of degree n and let Ny be the upper bound of its 
positive roots. We consider the polynomials 


o (a) =a" (=), 

P (z) = f (—2), 

P; (2) = 2"F (—<) 
and find the upper bounds of their positive roots. Suppose these are 
‘the numbers, respectively, Ni, Na, Na Then the number = will be 


1 
the lower bound of the positive roots of the polynomial f (x): if a is 
a positive root of f (x), then = will be a positive root of p, (z) and 


from L< N, follows a > . Similarly, the numbers —N, and 


-7; serve, respectively, as the upper and lower bounds of the negative 
roots of the polynomial f (x). Thus, all positive roots of f (x) satisfy 
the inequalities NA <(x< No, all negative roots, the inequalities. 


cree. A 


To determine the upper bound of the positive roots we can use 
the following method. Suppose we have the polynomial 


f (£) = aoz” + at p... + a, 


with real coefficients, and a) œ> 0. Let ap, k >14, be the first of the 
negative coefficients; if there were no such coefficients, then the 
polynomial f (z) could not have any positive roots at all. Finally, 
let B. be the greatest of the absolute values of the negative coeffi- 
cients. Then the number 
yB 
1+ £ 


serves as the upper bound for the positive roots of the polynomial f (zx). 

Indeed, setting x>>1 and replacing each of the coefficients 
äi, yg, ..-, Ap- by the number zero, and each of the coefficients 
üks Gaia, - ++, An by the number —B, we can only diminish the 
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value of the polynomial, that is, 





f (2) Daye” —B(a* 4 "414... 4241) aoa” — B gn : 
or, because zx > 1, 
=k nh 

f (2) > aya” — FP = TT laa (2—1)— Bl) 

If 
oe 
cand a (3) 

then, since 


ag”! (x —1) — Bay (x — 1)* — 


the expression in square brackets in formula (2) will prove to be 
positive; thus, by (2), the value of f (x) will be strictly positive. 
Thus, the values of x which satisfy the inequality (3) cannot be roots 
of f (x), which is what we set out to prove. 

Taking the above-considered polynomial k (x), this method 
(since k = 2, B = 7) yields for the upper bound of the positive 
roots the number 1 + V7, which can be replaced by the nearest 
greater integer 4. 

Of the many other methods of finding the upper bound of positive 
roots, we give Newton's method. It is more involved than the one we 
just gave above, but ordinarily it yields a very good result. 

Suppose we have a polynomial f (z) with real coefficients and 
positive leading coefficient a). If, for x = c, the polynomial f (zx) 
and all its successive derivatives f' (x), f" (x), ..., f™ (x) take on 
positive values, then the number c serves as the upper bound of the posi- 
tive roots. 

True enough, by Taylor’s formula (see Sec. 23), 


f(a)=F()+(e—o) F (0) + (0 —e H+... + on LAO 


We see that if z > c, then on the right we , get a strictly positive 
number, that is, such values of x cannot be the roots of f (zx). 

When seeking the appropriate number c for a given polynomial 
f (z), it is useful to do as follows. The derivative f” (2) = nla, is 
a positive number, and so the polynomial f"-» (x) is an increasing 
function of zx. Consequently, there is a number c; such that for 
x > c, the derivative f"-(z) is positive. Whence it follows that 
for z > c; the derivative f”-»(x) will be an increasing function of 
x and therefore there exists a number ca, cy > cı, such that for 
x > c, the derivative f"-® (x) is also positive. Continuing thus, we 
finally arrive at the desired number c. 
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Applying Newton's method to the polynomial A (z) considered 
above, we have 


h (x) = z + 224 — 523 + 82? — Tx — 3, 
h! (z) = Sat + 8r? — 152? + 162 — 7, 
h” (x) = 202? + 24r? — 30x + 16, 
h" (x) = 60x? + 48x — 30, 
AY (2) = 1202 + 48, 
h” (z) = 120 . 

It is easy to verify (say, by the Horner method) that all these 
polynomials are positive for x = 2. Thus the number 2 is the upper 
bound for the positive roots of the polynomial k (x). This result 
is much more exact than those obtained by other methods. 


To find a lower bound for the negative roots of polynomial h (z), 
let us consider the polynomial q, (z) = — h (—z) *. Since 


Pa (xz) = z — 2x4 — 5z? — 8r? — Tz + 3, 
Pa (z) = 524 — 8x? — 152? — 16x — 7, 
Pz (z) = 20r? — 24%? — 30x — 16, 
p2” (£) = 60z? — 482 — 30, 
piy (x) = 4120x — 48, 
gy (x) = 120 
and all these polynomials are positive (as may readily be checked 
for x = 4), the number 4 serves as an upper bound for the positive 
roots of @, (z), and so the number —4 will be a lower bound for the 


negative roots of h (z). 
Finally, let us consider the polynomials 


-p (2) = — ah (=) = 305 4 Jat 8294 5a2®—~<2Q2—1, 
Qs (2) = — 25h (—=) = 325 — Trt — 828 — 5r? 22 +4 


For them, again using the Newton method, we find the numbers 4 
and 4 as upper bounds for the positive roots and so the number 


7 = 1 is the lower bound for the positive roots of h (x) and the 


numbe: + is the upper bound for the negative roots. 


a * —h(—zx) in place of 4 (—zx) because Newton’s method is applicable 
only if the leading coefficient is positive. This change of sign of course has no 
effect whatsoever on the roots of the polynomial 2 (z). 
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Thus, the positive roots of h (x) lie between 1 and 2 and the nega- 
tive roots lie between the numbers —4 and ae This result is in 


very good agreement with what we found earlier when we examined 
the graph. 


40. Sturm’s Theorem 


We now come to the question of the number of real roots of a poly- 
nomial f (x) with real coefficients. We will be interested both in the 
total number of real roots, and, separately, the number of positive 
and the number of negative roots and the total number of roots in 
the interval between specified bounds a and b. There are several 
methods for finding the exact number of roots and all of them are 
very cumbersome; the most convenient one is the Sturm method, 
which we now discuss. 

First let us introduce a definition that will be needed in the 
next section as well. 

Suppose we have a finite ordered sequence of real numbers 
different from zero, say 


4, 3, —2, 1, —4, —8, —3, 4, 4 (1) 
Write down the signs of these numbers in succession: 
+, +, 9 +, rt T mT +, + (2) 


We see that there are four variations of sign in (2). We then say that 
in the ordered sequence (1) there are four variations in sign. The num- 
ber of variations in sign can of course be counted for any finite orde- 
red sequence of nonzero real numbers. 

Now let us consider the polynomial f (z) with real coefficients; 
we will assume that f (z) does not have multiple roots, for then we 
could divide it by its greatest common divisor and its derivative. 
The finite ordered sequence of nonzero polynomials with real coef- 


ficients 
f (2) = fo (2), fa (2), fa (£), ~ «e fe (2) (3) 


is called the Sturm sequence for the polynomial f (z) if the following 
requirements are met: 

(1) Successive polynomials of (3) do not have common roots. 

(2) The last polynomial, f, (x), does not have real roots. 

(3) If a is a real root of one of the intermediate polynomials 
fa (£) of (3), 1 < k < s — 1, then fp -1 (@) and fr+: (œ) have diffe- 
rent signs. 

(4) If æ is a real root of f (zx) , then the product f (x) fı (x) changes 
sign from minus to plus when z increases and passes through the 
point a. 
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The question of whether every polynomial has a Sturm sequence 
will be considered later on, for the present let us suppose that f (x) 
does have such a sequence and let us show how it can be used to 
find the number of real roots. 

> If a real number c is not a root of the given polynomial f (2) 
and (3) is a Sturm: sequence for this polynomial, then take the set of 
real. numbers ; 


f (e), fi (c), he (c), . . og fs (c) 


delete all numbers equal to zero and denote by W (c) the number 

of variations in sign in the remaining sequence; we call W (c) the 

number of variations in sign in the Sturm sequence (3) of polynomial 
z), r =c. 

i Pns following theorem holds. 

Sturm’s theorem. Ff the real numbers a and b, a < b, are not the 
roots of a polynomial f (x) which does not kave any mul- 
tiple roots, then W(a) > W(b) and the difference W(a) — 
— W(b) is equal to the number of real roots of f (x) in the interval be- 
tween a and b. 

Thus, to determine the number of real roots of a polynomial 
f (x) lying between a and b [recall that f (z) does not, by hypothesis, 
have multiple roots}, it suffices to establish the reduction in the 
number of variations of sign in the Sturm sequence of this polyno- 
mial when moving from a to b. 

To prove this theorem, let us see how the number W (z) varies 
with increasing z. So long as z, as it increases, does not encounter any 
of the roots of the Sturm sequence (3), the signs of the polynomials 
of the sequence do not change and so the number W (z) remains 
unaltered. For this reason, and also because of Condition (2) of the 
definition of a Sturm sequence, it remains for us to consider two 
cases: the passage of x through a root of one of the intermediate poly- 
nomials f, (x), 1 < k < s — 1, and the passage of z through a root 
of the polynomial f (z) itself. 

Let a be a root of the polynomial fp (z), 1 < k < s — 1. Then, 
by Condition (1), fp -1 (œ) and fp+: (a) are different from zero. We 
can thus find a positive number e, which may be very small, such 
that in the interval (x — e, a + e) the polynomials f, ., (z) and 
fats (x) do not have any roots and therefore preserve constant signs; 
Condition (3) states that these signs are distinct. From this it fol- 
lows that each of the sequences of numbers 


fr- (@ — €), fa (& — £), fats (a — e) (4) 


-. ® Quite naturally, the variations in sign in the Sturm sequence of the 
polynomial f(z) have nothing in common with the variation in sign of the 
polynomial f (z) itself, which variation occurs when z passes through a root 
of the polynomial. 
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and 


hiet) hat) hault) < © 


has exactly one variation in sign, irrespective of the signs of the 
numbers f} (a — e) and fẹ (x + e). Thus, for instance, if the poly- 
nomial f, -ı (x) is negative on the interval in question and f,4, (z) is 
positive and if f, (2 — e) > 0, fr (a + 2) < 0, then the sequences 
(4) and (5) are associated with the sign sequences 


—, +, +; — — + 


Thus, when z passes through a root of one of the intermediate poly- 
nomials in Sturm's sequence, the variations in sign in the sequence 
can only shift position, but do not disappear or reappear, and so the 
number W (x) does not change in such a transition. 

On the other hand, let a be a root of the given polynomial f (2). 
By Condition (1), a will not be a root of fı (z). Hence, there is a posi- 
tive number e such that the interval («x — e, a + e) does not contain 
any roots of the polynomial f; (z), and therefore f, (x) preserves its 
sign over this interval. If the sign is positive, then, by Condition 
(4), the polynomial f (x) itself changes sign from minus to plus when 
x passes through a, i.e., f (a — e) < 0, f (a + e) > 0. Hence, to 
the number sequences 


f(a — e), fi (a — e) and f (a + e), fi (a + e) (6) 
there correspond the sign sequences 
s + and +, + 


Thus, the Sturm sequence loses one variation in sign. But if the 
sign of fı (z) is negative on the interval {x — 2, a + £), then again, 
by Condition (4), the polynomial f (x) changes sign from plus to 
minus as x passes through a, i.e., f(a — e) > 0, f(a+ se) <0. 
To the number sequences (6) there now correspond the sign sequences 


+, — and —, — 


The Sturm sequence again loses one variation in sign. 

Thus, as x increases, the number W (x) changes only when x passes 
through a root of the polynomial f (x), in this case it is diminished exact- 
ly by unity. 

This. obviously proves the Sturm theorem. To use it for finding 
the total number of real roots of a polynomial f (z), it is sufficient 
to take, for a, the lower limit of the negative roots, and for b, the 
upper limit of the positive roots. It is simpler however to do as fol- 
lows. By the lemma proved in Sec. 23 there exists a positive number 
N, which may be very large, such that for | z | >> N the signs of all 
polynomials of the Sturm sequence will coincide with the signs of 
their highest-degree terms In other words, there exists a positive 
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value of the unknown z which is so large that the signs of the corres- 
ponding values of all the polynomials of the Sturm sequence coincide 
with the signs of their leading coefficients. This value of z, which 
need not be computed, can be denoted by oo. On the other hand, there 
exists a negative value of z which is so large in absolute value that 
the signs of the corresponding values of the polynomials of the 
Sturm sequence coincide with the signs of their leading coefficients 
for polynomials of even degree and are opposite to the signs of the 
leading coefficients for polynomials of odd degree. Let us agree 
to denote this value of z by —oo. In the interval (—co, 00) we obvio- 
usly have all the real roots of all the polynomials of Sturm’s sequence 
and, in particular, all the real roots of the polynomial f (x). Applying 
the Sturm theorem to this interval, we find the number of these 
roots; application of the Sturm theorem to the intervals (—oo, 0) 
and (0, co) yields, respectively, the number of negative and the 
number of positive roots of the polynomial f (z). 

It remains to demonstrate that any polynomial f (x) with real 
coefficients and without multiple roots has a Sturm sequence. Of a varie- 
ty of methods used for constructing such a sequence, we give the 
most widely used one. Set f, (z) = f' (x), thus ensuring fullfilment 
of Condition (4) of the definition of a Sturm sequence. Indeed, if 
a isa real root of the polynomial f (x), then f (a) 40. If f' (a) > 0, 
then 7 (x) > 0 in the neighbourhood of the point a and therefore. 
f (x) changes sign from minus to plus when z passes through g; this is 
then also true for the product f (zx) fı (z). Similar reasoning is like- 
wise valid for f’ (a) < 0. Then divide f (z) by f, (x) and take the 
remainder (with reversed sign) for f, (zx): 


f(z) = fi (2) as (2) — fa (2) 


Generally, if the polynomials f,.; (z) and fp (x) have already been 
found, then f+: (x) will be the remainder after dividing fp- (z) 
by fa (x) taken with reversed sign: 


r-i (1) = fa (2) qa (£) — frs (2) (7) 


This method differs from the Euclidean algorithm as applied 
to the polynomials f (z) and f’ (x) solely in the fact that the sign 
of the remainder is reversed every time, and the next division 
is performed by the remainder with reversed sign. Since such a varia- 
tion in sign is inessential when seeking the greatest common divisor, 
our process will terminate at some f, (x), which is the greatest 
common divisor of the polynomials f (x) and f' (x); since f (x) has 
no multiple roots [it is prime to f’ (z)] it will follow that f, (x) is 
actually some nonzero real number. 

This implies that the sequence of polynomials we have construc- 


ted, 
f (2) = fo (2), F (2) = fi (£), h@,--+ fs (2) 
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also satisfies Condition (2) of the definition of a Sturm sequence. 
To prove that Condition (1) is met, assume that the consecutive 
polynomials f, (x) and f+: (x) have a common root a. Then, by (7), 
a will also be a root of the polynomial f, -1 (x). Passing to the equa- 
tion 


fr-o (£) = fa-s (2) gr-1 (2) — fr (2) 


we find that œ is a root of fp _, (x) as well. Continuing, we find that 
œ is a common root of f (z) and f' (x), which is in conflict with our 
assumptions. Finally, fulfillment of Condition (3) follows directly 
from equation (7); if fp (a) = 0, then f, 4 (a) = — fra: (2). 
Let us apply the Sturm method to the polynomial 
h (x) = z + 2z — 5r? + 82? — Tx — 3 


which we considered in the preceding section. We will not make 
a preliminary check to see that k (z) does not have any multiple 
roots, because the method of constructing a Sturm sequence as 
given above is a simultaneous check on the relative primality of the 


polynomial and its derivative. 

Let us find a Sturm sequence for k (x) by using this method. In 
the division process, we will (in contrast to the Euclidean algorithm) 
multiply and divide only by arbitrary positive numbers since the 
signs of the remainders play a fundamental role in the Sturm method. 


We obtain the following sequence: 
h (x) = z + 22* — 523 + 8x? — Tz — 3, 
hy (x) = 52* + 8r? — 152? + 162 — 7, 
h, (xz) = 6623 — 150r? + 172z + 61, 
hs (z) = —464a? + 11352 + 723, 
h, (z) = —32,599,4572 — 8,486,093, 


hs (z) = —1 
We determine the signs of the polynomials of this sequence for 
zr = —ooand x = oo; to do this, we (as indicated above) only 


examine the signs of the leading coefficients and the degrees of the 
polynomials. We get the following table: 


























Number of 
A(x) | A(x) he (x) hg (x) ha (x) hg (x) | Variations in 
sign 
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Thus, when zx passes from —oo to oo, the Sturm sequence loses 
three variations in sign and so the polynomial k (x) has exactly 
three real roots. It will be recalled that when we constructed the 
graph of this polynomial (in the preceding section) we did not lose 
a single root. 

Let us apply the Sturm method to a simpler polynomial: 


f(z) = z? + 3z? — 1 


Let us find the number of its real roots and also the integral bounds 
within which each of the roots is located, We shall not construct 
the graph of this polynomial. 

The Sturm sequence associated with the polynomial f (x) is 


f (2) = 2? + 32? — 1, 
fi (x) = 3z? + 62, 

fo (x) = 2z + 4, 

fs (z) = 1 


Let us find the number of variations of sign in this sequence 
for x = — œ and z = œ 








| f(x) | fi (x) | fa (x) | oy Sener en 





‘Consequently, the polynomial f (z) has three ‘eal roots. For a more 
precise location of the roots, continue the above table: 


Number of varia- 


| | ff) | fı (x) | f2 (x) | fs (x) | tions in sign 
a a 
aa pe+pef=[+ [= 
a 
f-o ee 
a p e 
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Thus, the Sturm sequence of the polynomial f (x) loses one varia- 
tion of sign each time z moves from —3 to —2, from —4A to 0 and 
from 0 to 1. The roots a;, a, and æ; of this polynomial thus satisfy 
the inequalities 


—3<a<-2, -1<a,<0, 0<a;<1 


41. Other Theorems on the Number of Real Roots 


The Sturm theorem completely resolves the question of the num- 
ber of real roots of a polynomial, but it has one essential defect and 
that is the cumbersome computations involved in constructing a 
Sturm sequence, as the reader could see after performing all the 
computations of the first example above. We now prove two theorems 
which do not yield the exact number of real roots but only bound 
the number from above. These theorems are employed after a graph 
has been used to bound the number of real roots from below and at 
times enable us to find the exact number of real roots without resor- 
ting to the Sturm method. 

Suppose we have an mth-degree polynomial f (x) with real coef- 
ficients; we assume it can have multiple roots. Let us consider a se- 
quence of its consecutive derivatives: 


f (x) = f (2), F(z), f" (a) oo FOP (2), F (2) (1) 


of which the last one is equal to the leading coefficient ag of f (zx) 
multiplied by n! and for this reason preserves sign at all times. If 
a real number c is not a root of any one of the polynomials of the 
sequence (1), then by: S (c) we denote the number of variations in 
sign in the ordered sequence of numbers 


fle), F h Fs ©. + FP (©), F” © 


Thus, we can consider the integer-valued function S (x) defined 
for those values of z which do not make any one of the polynomials 
in (4) vanish. 

Let us see how S (z) varies with increasing x. The number 5S (z) 
remains unchanged until z passes through a root of one of the poly- 
nomials of (1). We thus have two cases to consider: the passage of x 
through a root of the polynomial f (x) and the passage of z through 
a root of one of the derivatives f (z), 1<k<in—1. 

Let æq be an l-fold root of the polynomial f (x), 1 > 14, i.e., 


f@ =f (@) =... =f (a) =0, fo (a) +40 


Let a positive number e be so small that the interval (2 — £, a + e) 
does not contain any roots of the polynomials f (z), f (z), . 
..., JEP (x), different from æ and does not contain any root of the 
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polynomial f) (x) either. We will prove that in the number sequence 
f(a — e), F («a — 28), ..., fb) (a — e), {P (a — e) 


any two consecutive numbers have opposite signs, whereas all the 
numbers 


fa+e), F (@ + £), ..5 MYO (a -+ e), FP (a + e) 


have the same sign. Since each one of the polynomials of (1) is a 
derivative of the preceding polynomial, all we have to prove is that 
if z passes through the root a of polynomial f (x), then, irrespective 
of the multiplicity of this root, f (x) and f’ (a) had different signs 
prior to the passage and have coincident signs after the passage. 
If f (a — e)> 0, then f (x) diminishes on the interval (@ — £, a), 
and so f’ (x — e) < 0; but if f (a — e) < 0, then f (z) increases and 
so f’ (2 — e) > 0. Hence in both cases the signs differ. On the 
other hand, if f (a + e) >> 0, then f (zx) increases on the interval 
(a, œ + e) and so f’ (a + e) > 0; similarly, from f (æ + e) <0 
it follows that f’ (a + e) <0. Thus, after the passage through the 
root a, the signs of f (x) and f' (x) must coincide. 

From what has been proved it follows that when z passes through 
an l-fold root of the polynomial f (x) the sequence 


f (2), F (a) ©. FY @), FP (2) 


loses 1 variations in sign. ` 
Now let œ be a root of the derivatives 


FH (z), fOr? (a), 22, JHD (2) AKkKna—i, l>1 


but not a root of f*- (x) or of f+ (z). By what has been proved 
above, the passage of z through œ implies a loss in the sequence 


f® (x), jo (ajas JHD (x), f+ (z) 


of Z variations in sign. True, this passage possibly creates a new 
variation in sign between f@—1) (x) and /*) (z); however, because 
i > 4, the number of variations in sign, when z passes through @ in 
the sequence 


FOD (a), FO (2), JOHN (a), FOF (a), fOD (a) 


either does not change or decreases. It can then decrease only by an 
even number since the polynomials f*-} (x) and f*+" (x) do not 
change sign when z passes through the value g. 

These results imply that if the numbers a and b, a < b, are not 
roots of any one of the polynomials of the sequence (1), then the number of 
real roots of the polynomial f (x) lying between a and b (each counted 
according to its multiplicity) is equal to the difference S (a) — S (b) 
or is less than this difference by an even: number. 
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In order to relax the restrictions imposed on the numbers a and 
b, let us introduce the following notations. Suppose the real number 
c is not a root of the polynomial f (z), though it may be a root of some 
of the other polynomials of the sequence (1). Denote by S4 (c) the 
number of variations in sign in the number sequence 


Fe), P e F (es «or FP © F™ © (2) 
which is computed as follows: if = 
7 (e) = FHP (o) =... = fOD (e) = 0 (3) 

but 
fe-4) (e) 0, fF“? (e) 0 (4) 


then we take it that f™ (e), #8 (c),..., f@"- (c) have the 
same sign as f“*) (c); this is obviously the same as deleting the zeros 
in a count of the number of variations of sign in the sequence (2). 
On the other hand, by S . (c) we denote the number of variations of 
sign in the sequence (2), which is counted as follows: if conditions 
(3) and (4) hold, then we take it that f**» (c), 0 < i<i—41, has 
the same sign as f*+» (c) if the difference 1 — i is even, and opposite 
sign if this difference is odd. 

If we now desire to determine the number of real roots of the 
polynomial f (x) between a and b, a < b, and a and b are not roots 
of f (x) but, possibly, are roots of the other polynomials of the se- 
quence (1), then we do as follows. Let e be so small that the interval 
(a, a + 2e) does not contain any roots of f (x), or any roots (distinct 
from a) of the other polynomials of the sequence (1); on the other 
hand, let ņ be so small that the interval (b — 2n, b) also fails to con- 
tain any roots of f (x) and any roots (distinct from b) of the other 
polynomials of the sequence (1). Then the number we want of real 
roots of the polynomial f (xz) will be equal to the number of the real 
roots of this polynomial between a + s and b — yn, that is, from 
what has been proved, it will be equal to the difference S (a + e) — 
— § (b — n) or less than this difference by an even number. Howe- 
ver, it is easy to see that 


S (a + e) = S4 (a), S(b — n) = S- (b) 


This is proof of the following theorem. 

Budan-Fourier theorem. If the real numbers a and b, a < b, are 
not the roots of a polynomial f (x) with real coefficients, then the number 
of real roots of this polynomial between a and b, each counted according 
to its multiplicity, is equal to th. difference S4 (a) — S - (b) or is 
an even number less than this difference. 

Use the symbol oo to denote a positive value of the unknown z 
so large that the signs of the associated values of all the polynomials 
of the sequence (1) coincide with the signs of their leading coeffi- 
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cients. Since these coefficients are, sequentially, the numbers ap, 
nay, n(n — 1) ao, ..., mlao, whose signs coincide, it follows that 
S (co) = S (oo) = 0. On the other hand, since 


f (0) = an, F (0) = ans, f” (0) = a, -22!, 
J” (0) = a, -33!, -.., f™ (0) = aon! 


where ao, a, ..., @, are coefficients of the polynomial f (x), then 
S (0) coincides with the number of variations in sign in the sequence 
of coefficients of f (x), zero coefficients being deleted. Thus, applying 
the Budan-Fourier theorem to the interval (0, oo) we arrive at the 
following theorem. 

Descartes’ theorem (Deseartes’ rule of signs). The number of 
positive roots of a polynomial f (x), a root of multiplicity m being 
counted as m roots, is equal to the number of variations in sign in the 
sequence of coefficients of this polynomial (zero coefficients are not 
counted) or is less by an even number. 

To determine the number of negative roots of the polynomial 
f (x) it is obviously sufficient to apply Descartes’ theorem to the 
polynomial f (—z). If none of the coefficients of f (x) is zero, then, 
obviously, changes of sign in the sequence of coefficients of the poly- 
nomial f (—z) will be associated with preservation of signs in the 
sequence of coefficients of the polynomial f (x), and conversely. Thus, 
if the polynomial f (x) does not have zero coefficients, then the number 
of its negative roots (counting multiplicities) is equal to the number of 
preservations of signs in the sequence of coefficients or is less by an even 
number. 

We give another proof of the Descartes theorem that does not 
rely on the Budan-Fourier theorem. We first prove the following 
lemma. 

If c> 0, then the number of variations of sign in the sequence of 
coefficients of the polynomial f (x) is less than the number of variations 
of sign in the sequence of coefficients of the product (x — c) f (x) by an 
odd number. 

Indeed, enclosing in parentheses successive terms of the same 
sign, we can write the polynomial f (x), the leading coefficient ao 
of which can be considered positive, as follows: 


f (x) = (qga® +... + bye!) — (ay +e... + bott t) 

+... + (— 1) (asst... Hber) (5) 
Here, ao œ> 0, aœ 0, ..., aœ 0, whereas bı, ba, ..., b; are 
positive or zero, but boya is considered strictly positive, that is, zf, 


where ¢ > 0, is the smallest power of the unknown z that enters into 
the polynomial f (x) with a nonzero coefficient. The parenthesis 


(agu" +... + baztrt i) 
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may accidentally consist of a single term, namely, when k; + 1 =n. 
An analogous remark is applicable to the other parentheses of formu- 
la (5). 

Now write a polynomial equal to the product (z — e) f (x); we will 
single out only those terms which contain z to the powers n + 14, 
ky +4, ..., k+ i, and t. We obtain 


(20) f (2) = (age +...) — (aat. ) 
+... (= 1) (agrs tt 4 n m cbet") (6) 


where a; = a; + cb;, i = Å, 2, ..., s, and therefore, since c > 0, 
all the a; are strictly positive. Thus, there was one change of sign 
in the sequence of coefficients of the polynomial f (xz) between the 
terms az” and —a,z (also between the terms —a,z*1 and a,z*?, 
etc.), whereas in the polynomial (z — c) f (x) there will either be 
one change of sign between the corresponding terms a z"*! and 
—a,x*1*! (respectively between the terms —a,x *1*1 and aj2*2+1, etc.) 
or more changes (but always more by an even number). We are not 
interested in the exact places of these changes in sign. It may happen, 
for example, that the coefficient of z*1+? in (6) is negative, like the 
coefficient —a;, and so there is no change of sign between these two 
successive coefficients; that is to say, the change in sign in the first 
parenthesis is located at some previous position. Now notice that 
the last parenthesis in (5) did not have any variation in sign, whereas 
the last parenthesis in (6) did have variations in sign—an odd num- 
ber of them: it suffices to note that the last nonzero coefficients of 
the polynomials f (x) and (z — c) f (az), that is, (—1)°b,4, and 
(—1)'#4b,41c have different signs. Thus, between f(z) and 
(x — c) f (x) the total number of variations of sign in the sequence 
of coefficients invariably increases and by an odd number (the sum 
of several terms, one of which is odd and the others even, will natu- 
rally be odd!). The lemma is proved. 

To prove Descartes’ theorem, denote all the positive roots of the 
polynomial f (x) by a4, Œs, ..., &,. Then 


f (x) = (£ — a4) (£ — ae)... (£ — ar) Ẹ @) 


where q (z) is a polynomial with real coefficients which now has 
no positive real roots. This implies that the first and the last non- 
zero coefficients of the polynomial ọ (z) are of the same sign, which 
means that the sequence of coefficients of this polynomial centains 
an even number of variations of sign. Applying the above-proved 
lemma to the polynomials 


P (z), (£ — a1) p (2), (@ — a) (£ — ay) p (2), <. f (2) 


in succession, we find that the number of variations of sign in the 
sequence of coefficients increases each time by an odd number, that 
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is to say, by unity plus an even number, and so the number of varia- 
tions of sign in the sequence of coefficients of the polynomial f (x) 
is greater than k by an even number. 
Let us apply the theorems of Descartes and Budan-Fourier to 
the earlier considered polynomial 


h(a) = zë + 224 — 5r? + 82? — Te — 3 


` The number of variations of sign in the sequence of coefficients 
is three, and so by Descartes’ theorem, h (z) can have three positive 
-roots or one. On the other hand, k (x) has no zero coefficients, but 
since the sequence of coefficients has two preservations of sign, 
h (x) either has two negative roots or none. We compare with the 
. results obtained earlier with the aid of the graph and see that two 
` is the exact number of negative roots of our polynomial. 

To determine exactly the number of positive roots, use the Budan- 
Fourier theorem, applying it to the interval (1, oo), since in Sec. 39 
it was demonstrated that 1 serves as a lower bound to the positive 
roots of the polynomial hk (z). The successive derivatives of h (x) 
were also written out in Sec. 39. Let us find their signs for z = 1 
and z = œ: 




















Number of 
| rw he (x) | AY (x) | hm gx) | ATV (xy hY (x) | variations in sign 
ae eco a a a 


exo | + [+ {+f 4+] +4 ] + | 0 


From this it follows that when x moves from 1 to oo the sequence 
of derivatives loses one change of sign, and so k (x) has exactly one 
positive root. 

In connection with this example, it should be noted that, gene- 
rally speaking, when seeking the number of real roots of a polyno- 
mial it is best to begin by constructing a graph and applying the 
theorems of Descartes and Budan-Fourier, and then only in extreme 
cases to go on to construct a Sturm sequence. 

. The Descartes theorem admits of a certain refinement in the 
special case when we know beforehand that all the roots of the poly- 
. nomial are real, as for instance in the case of the characteristic 
polynomial of a symmetric matrix. Namely, 

If all the roots of a polynomial f (x) are real, and the constant term 
is nonzero, then the number k, of positive roots of the polynomial is 
equal to the number s, of variations in sign in the sequence of its coeffi- 
cients, and the number k, of negative roots is equal to the number s, 
of variations in sign in the sequence of coefficients of the polynomial 


f (—2). 
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Indeed, under our assumptions, 
kj tk, =n (7) 
where n is the degree of the polynomial f (z), and, by Descartes’ 
theorem, 
i kı < S1; ky < S2 (8) 
We will prove that 
3 + 5.70 (9) 
We will prove it by induction with respect to n, since for n = 1, 
due to a 0, a, ~0, only one of the polynomials 
f(z) = age + a, f (—x) = —agr + ay 
has a change of sign; that is, for this case, sı + s = 1. Let formu- 
la (9) be proved for polynomials whose degree is less than n. If 
f (£) = aot” +a24+...+4, 
where L < n — 1, an-ı 0, we assume 
ia g (2) = api! +. H an 
Then : : i ` 
f (x) = aoz” + g (x), f (—x) = (—1)" aga” + g (—2) 
If si and s, are, respectively, the numbers of variations in sign in 
the sequences of coefficients of the polynomials g (x) and g (—z), 
then, by the induction hypothesis (it is clear that / > 1), 
sHs sil 
If 2 = n — 1, then the variation in sign in the first place, i.e., for 


f (x), between aj and a, = an ~; will occur only in the case of one of 
the polynomials f (x), f (—z), and so 


sts=stetigi+i=n 
But if Z < n — 2, then variations of sign are possible in the first 


places of each of the polynomials f (x), f (—z); however, in this case 
as well, 


Ste<stsyt2ql+2cqMm—24+2=n 
Comparing (7), (8) and (9), we see that ` 
ky =, he = Sq 
The proof is complete. 


42. Approximation of Roots 


The methods described in the preceding sections enable us to 
isolate the real roots of a polynomial f (z) with real coefficients, 
that is to say, they permit indicating for each root the interval 
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containing it alone. If the interval is small enough, then any number 
in the interval may be taken as an approximation of the desired 
root. Thus, after it has been demonstrated by the Sturm method 
(or any other more efficient method) that there is only one root of 
the polynomial f (x) between the rational numbers a and b, the 
problem remains of narrowing this interval so that the new limits 
a’ and b’ possess a prescribed number of coincident first decimals. 
The desired root will thus be computed to the needed accuracy. 

There are many methods which per- 
mit us to speedily approximate the value 
of a root with any desired accuracy. We 
will describe two. They are simple theo- 
retically and general enough so that when 
used in conjunction they quickly yield 
results. The methods we are about to 
describe can be applied not only to poly- 
nomials but also to the broader classes 
of continuous functions. 

From here on we assume that a is Fig. 10 
a simple root of a polynomial f (z), since 
we can always dispose of multiple roots, and that the root @ is 
isolated between the limits a and b, a < œ < b; this implies, for one 
thing, that f (a) and f (b) have different signs. 

The method of linear interpolation (also called the method of 
false position or regula falsi). For an approximate value n the root 








a we could take, say, the half sum of the limits a and b, te , i.e., 
the midpoint of the interval from a to b. It is more natural, ower. 
to assume that the root is closer to that endpoint of the interval 
(a, b) which corresponds to the smallest absolute value of the poly- 
nomial. The method of linear interpolation consists in taking a 
number c for the approximate value of the root a, such that divides 
the interval (a, b) into parts proportional to the absolute values 
of the numbers f (a) and f (b); that is, 


c—a___— fla 

b—c Fe) 
The sign of the right member is minus because f (a) and f (b) have 
different signs. Whence 





— of (a)—af (b) 
JORO (t) 
Geometrically, as Fig. 10 indicates, the method of linear inter- 
polation consists in replacing the curve y = f (z) on the interval 
(a, b) by its chord connecting the points A (a, f (a)) and B (b, f (b)); 
for the approximate value of the root œ we take the abscissa of the 
point of intersection of the chord and the z-axis. 
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Newton’s method. Since œ is a simple root of the polynomial 
f(x), it follows that f (2) 40. We also assume that f” (a) #0 
since otherwise the problem would reduce to computing the root of 
the polynomial f” (z) of lower degree than f (x). We likewise assume 
that the interval (a, b) does not contain roots of f (x) different from 
a, neither does it contain any root of the polynomial f’ (x) or the 
polynomial f” (z).* Thus, as follows from mathematical analysis, 
the curve y = f (x) is either monotonic increasing on the interval 





Fig. 14 . Fig. 12 





Fig. 13 Fig. 14 


a, b) or monotonic decreasing; also, it is either convex up at all 
points of the interval or convex down at all points. Consequently, 
there are four cases (shown in Figs. 44 to 14) of the location of the 
curve on the interval (a, b). 

Denote by a, the endpoint a or b in which the sign of f (x) coin- 
cides with the sign of f” (x). Since f (a) and f (b) have different signs, 
and f” (x) preserves sign throughout the interval (a, b), such an dp 
can be indicated. In the cases given in Figs. 44 and 14, a =a, 
in the other two cases, dj = b. At the point of the curve y = f (2) 


* There is usually no difficulty in narrowing the interval so that this 
condition is satisfied, since the methods given earlier permit establishing the 
number of roots of polynomials f(x) and /*(z) in any interval. 
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with abscissa ap, that is, at the point with coordinates (ao, f (ao)), 
draw a tangent line to this curve and denote by d the abscissa of the 
intersection point of this tangent with the z-axis. Figs. 11 to 14 
show that the number d may be taken as an approximate value of 
the root a. The Newton method thus consists in replacing the curve 
y = f (x) on the interval (a, b) by its tangent at one of the endpoints 
of the interval. The condition imposed on the choice of the point 





Fig. 15 


a is very essential. Fig. 15 shows that if this condition is not obser- 
ved, the intersection point of the tangent line and z-axis may not 
at all give an approximation to the desired root. 

Let us derive a formula for finding the number d. We recall that 
the equation of the tangent to the curve y = f (x) at the point (ap, 
f (@o)) may be written as 


y — f (ao) = f' (ao) (£ — a) 


Substituting the coordinates (d, 0) 
of the point of intersection of the 
tangent line with the z-axis, we 
get 

i — f (ao) = f (ao) (d — ao) 


whence 





d= ay — 7,00) (2) 


Fig. 16 


If in Figs. 11-14 the reader connects A and B by chords, he will 
see that in all cases the methods of linear interpolation and of Newton 
yield approximations to the true value of the root a from different sides. 
It is therefore advisable, if the interval (a, b) is such as required 
by Newton’s method, to combine the two methods. In this way we 
obtain much closer endpoints c and d for the root a. If the accuracy 
of the approximation is not sufficient, apply both methods (see 
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Fig- 16) once again to:the interval,:and so on.: We can: demonstrate 
that this process does indeed permit computing the root œ to any 
desired accuracy. 

Let us apply these methods to the polynomial 


h (z) = 28 + 2r — 523 + 822 — Tz — 3 


which we dealt with in preceding sections. 

We know that this polynomial has a simple root a, lying between 
i and 2. We can say right off that these limits are too broad for the 
methods of linear interpolation and of Newton, used only once each, 
to yield a decent result. However, let us employ them so as to have 
one example that does not require involved computations. 

As we saw in Sec. 41, for z = 1 the derivatives h’ (x), k” (x), .. 
..., BY (x) receive positive values. This implies, on the basis of the 
results of Sec. 39, that the value z = 4 serves as an upper bound of 
the positive roots for h’ (x) and also for h” (x). Hence, the interval 
(1, 2) does not contain any roots of these derivatives and so we can 
apply the Newton method. Besides, A” (z) is positive everywhere 
in the interval, and since 


h (1) = —4, h (2) = 39 
we have to take ay: = 2. Seeing that k’ (2) = 109, we get, by formu- 


la (2), 


179 
d= 2— a ak. 64.. 


On the other hand, formula (1) yields 


gon OED = 1.09... 


and, consequently, the root a, lies within the interval 
4.09 < oy < 1.65 


This narrowing of the interval that we obtained is too slight 
to consider the result satisfactory. We could of course apply our 
methods to the new interval, but it is more advisable from the very 
beginning to find a sufficiently small interval for a,, say to within 
0.4 or even 0.01, and only then apply the methods. Quite naturally, 
this at once makes all the computations very cumbersome, but in 
the solution of concrete problems requiring exact knowledge of the 
roots of a polynomial, this has to be done. 

Let us return to our polynomial h (x) and its root a; note that 
all values of the polynomials given below are computed by the 
Horner method. Since 


h (4.3) = —0.13987, h (4.34) = 0.0662923854 
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it follows that 

1.3< a, < 1.34 
that is, we have the value of the root a, to an accuracy of 0.01. Now 
let us apply the method of linear interpolation to the new interval: 


__ 4.34-(— 0.13987) —1.3-0.0662923851  0.26940980063 _ 1.30678 
Em — 0.13987 — 0.0662923854 ~ 0.2061623851 ~~" a 


We also apply Newton’s method to this interval, setting ay = 


= 1.31. Since 
h’ (1.34) = 20.92822405 


it follows that 
: 0.0662923851  27.3496811204 
d= 1.31 —~ 30792829405 ~ ~20.92822405 1.30683 eve 
Thus, 
4.30678 < a, < 1.30684 


and therefore, setting a, = 1.30681, we have an error of less than 
0.00003. — : 

We have not yet shown that the foregoing methods actually 
permit computing ‘a root to any desired accuracy, that is to say we 
have not proved the convergence of these methods. Let us do so at 
least with respect to Newton’s method. 

As above, let the simple root a of the polynomial f (z) lie in the 
interval (a, b) chosen as required by the Newton method. For one 
thing, this implies the existence of positive numbers A and B such 
that everywhere on the interval (a, b), 

If @l>A, I?’ @1<8B (3) 


We introduce the notation — 


ae! 

= | OF aa 

and assume that 
C(b—a)<1 (4) 


To fulfil this inequality it may be necessary to replace the interval 
(a, b) of the root a by a smaller one; but this will not affect the vali- 
dity of inequalities (3). Let a) be the endpoint of the interval (a, b) 
at which Newton's method is to be applied. On the basis of formula 
(2) we get a succession of approximate values of the root a: a;,a@,, . . . 
. ++, Gp, ..., lying in the interval (a, b) and related by the equali- 
ties 

ar = 4-4 — Fed » k=1, 2,... 





Let 
a = ar + hp k=0,1,2,... (6) 
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Then 


O= f (a) = f (ax) + nf’ (an) + P (an + Oh) 


where 0 <0 < i. Since f’ (a,) 340 due to the condition imposed 
on the interval (a, b), we get, taking into account (5) and (6), 











hh f (@n+Ohp) _ flan) — Jlar) , 
“aora a S = F ny) Aa She 
Whence 
6h, B 
| ones | = Hi Ee ngg = Chi, k=0, 1, 2) +s 
Thus ate 
[hrs |< ChE < Chh- < CTh < oo <O A 
or, since |k | = |œ — a | < b — a, l 
k . 
[hanl < CHC ba)", k=0, 1,2, ... (7) 


Whence, because of condition (4), it follows that the difference hy 
between the root œ and its approximate value a, obtained by successive 
application of the Newton method tends to zero with increasing k. The 
proof is complete. 

Note that (7) gives an estimate of the error for the (k + 4)th 
step; this is essential if the Newton method is used by itself and 
not in conjunction with the method of linear interpolation. 

Texts dealing with the theory of approximations give procedures 
with better arranged computations (that simplify their use) than 
those we have given. Such courses also describe many other methods 
for approximating roots. These include the method of Lobachevsky 
(sometimes erroneously called the Graeffe method). This method 
enables one to find at once the approximate values of all roots, inclu- 
ding complex roots, and does not require a preliminary isolation 
of the roots. However, the computations are extremely unwieldy. 
Underlying this method is the theory of symmetric polynomials, 
which we describe in Chapter 11 below. 


CHAPTER 10 


FIELDS 
AND POLYNOMIALS 


43. Number Rings and Fields 


In the earlier parts of this book we have frequently been in a 
position where we investigated complex numbers or only real numbers 
with the proviso that the results obtained hold true if we restrict 
ourselves to the real numbers (or, correspondingly, that they carry 
over word-for-word to the case of any complex numbers). As a rule, 
in all these cases it might be noted that the theory would hold true 
completely if we confined ourselves solely within the scope of the 
rational numbers. The time has now come to indicate the reasons 
for this parallelism and thus enable us to present the material 
(which follows) in its natural generality, that is to say, in accepted 
algebraic language. To do this, we introduce the concept of a field, 
and also the broader concept (which plays a subsidiary role in our 
course) of a ring. 

Evidently, the systems of all complex, real and rational numbers, 
like the system of all integers, have one property in common: they are 
all closed not only under addition and multiplication, but under sub- 
traction as well. This property of the enumerated number systems 
distinguishes them, say, from the system of positive integers or posi- 
tive real numbers. 

Any system of numbers, complex or (in the particular case) 
real, containing a sum, a difference and a product of any two of its 
numbers is termed a number ring. Thus, the systems of all integers, 
and of rational, real and complex numbers are number rings. On 
the other hand, no system of positive numbers is a ring since if a 
and 6 are two different numbers, then either a — 6, or b — a is 
negative. Neither is a system of negative numbers a ring because 
the product of two negative numbers is positive. 

The four examples given above do not by any means exhaust 
the range of number rings. A few more instances will now be given; 
each time it is left to the reader to verify that the number system 
is indeed a ring. 


17—5760 
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The even numbers form a ring; generally, for any natural number 
n the collection of integers exactly divisible by nm is a ring. The 
odd numbers do not constitute a ring since the sum of two odd numbers 
is an even number. 

Another instance of a ring is the collection of rational numbers 
whose denominators, in lowest terms, are powers of 2. This collec- 
tion includes, for example, all integers, since when simplified their 
denominators are 1, that is, two to the power zero. In this example, 
in place of 2 we can of course take any prime number p. Generally, 
taking any (finite or infinite) set of prime numbers and considering 
the system of rational numbers whose simplified denominators are 
divisible only by primes belonging to the given set, we again get a 
ring. On the other hand, the collection of rational numbers whose 
simplified denominators are not divisible by the square of any prime 
will not be a ring, since the indicated property of the numbers is not 
preserved in their multiplication. 

Let us now examine number rings that do not lie entirely in the 
ring of rational numbers. A collection of numbers of the form 


at+bV2. (1) 


where a and b are any rational numbers, is a ring; in particular, 
this ring includes all rational numbers (for b = 0) and also the number 
VZ itself (for a = 0, b = 1). We would also have obtained a ring 
if we had confined ourselves to numbers of the form (4) with inte- 
gral coefficients a, b. In these examples, we could of course have 


taken / 3 or V5, etc. in place of V2. 
The system of numbers of the form 


a+b (2) 


with rational (or only integral) coefficients a, b is not a ring because 


the product ofp 2 by itself cannot, as can easily be checked, be 
written as (2).* However, the system of numbers of the form 


atbf/2+ey/4 (3) 
* Indeed, let 
Va=apoy2 (2’) 


where the numbers a and } are rational. Multiplying both sides of this equation 
by V3, we get S 
2=a V 2404 


Substituting the expression (2’) for / 4, we arrive (after some obvious manipula- 
tions) at the equation 


(a+b) V Z=2—ab (2") 
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with arbitrary rational coefficients a, b, c, is a ring, and this is 
also true if we confine ourselves to the case of integral coefficients. 

Let us now consider all real numbers obtainable by applying 
several times the operations of addition, multiplication and sub- 
traction to the familiar number pi (x) and any rational numbers. 
These will be numbers that can be written as 


ag + ayn + aan? +... + an” (4) 


where ao, 44, Aa, - . ., @ are rational numbers, n > 0. Note that 
no number can have two distinct notations of the type (4), for other- 
wise, by taking their difference, we would find that the number x 
satisfies some equation with rational coefficierits; now methods of 
mathematical analysis tell us that actually x cannot satisfy any 
equation with rational coefficients, which is to say that m is trans- 
cendental. Incidentally, even without taking advantage of this 
result, that is, assuming that the notation of a number in the form 
(4) is unique, we can show that numbers like (4) constitute a ring. 

Another ring is the collection of numbers obtained from xn and 
rational numbers via operations of addition, multiplication, sub- 
traction and division applied several times. To prove this, there is 
no need to seek a particularly suitable notation for these numbers 
(though it may possibly be found). If the numbers @ and f are obtai- 
ned from x and some rational numbers by the indicated operations, 
then quite naturally it will be true of the numbers a + ĝ, a — B, 


aß and also (for B + 0) of the number Ž . 


Finally, if we take the collection of complex numbers a + bi 
with arbitrary rational a, b, we get a ring; this will also be true 
if we confine ourselves to integral coefficients a, b. 

The examples given above do not give a full picture of the great 
diversity of number rings. But we will not now continue the list 
of examples and will examine one special and very important type 
of number ring. We of course know that in the systems of rational, 
real, and complex numbers, division (except by zero) is unlimited, 
whereas these number systems are not closed under division of inte- 
gers. Up to now we paid but slight attention to this difference. Actu- 
ally, it is very essential and brings us to the following definition. 

A number ring is called a number field if it contains the quotient 
of any two of its numbers (the divisor is of course assumed to be 


If a+ b? 0, then Sasa 

3 an — a 

V2= a--b2 
which is impossible since the number on the right is rational. But if a +- b? = 
=0, then, by (2”) we have 2 — ab = 0. From these two equations follows the 


fact that b8 = —2which is again out of the question since the number b is 
rational. 


17* 
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different from zero). We can thus speak of the field of rational. num- 
bers, the field of real numbers, the field of complex numbers, 
whereas the ring of integers does not constitute a field. 

Some of the earlier considered examples of number rings are 
actually fields. To begin with, notice that there do not exist number 
fields different from the field of rational numbers and entirely embed- 
ded in it (we do not consider the system of zero alone to be a field). 

Even the following more general assertion holds true. 

The field of rational numbers lies entirely within any number field. 

Indeed, let there be some number field, call it P. If a is any 
number of P different from zero, then P also contains the quotient 
of the division of a by itself, that is, the number 1. Adding unity 
to itself several times, we find that all the natural numbers lie in 
the field P. On the other hand, P must also contain the difference 
a — a, which is the number 0, and so P contains the result of sub- 
tracting any natural number from zero, which is to say, any negative 
integer. Finally, P contains the quotients of all integers, or, gene- 
rally, all rational numbers. 

The field of complex numbers contains many different fields, and 
the field of rational numbers is only the smallest in it. Thus, the 
ring, considered above, of numbers like 


a+tbV2 (5) 


with arbitrary rational (and not only integral) coefficients a, b is 
a field. To see this, consider the quotient of two numbers of the 
form (5), a+ b V2 and c + d V 2; consider the second number to 
be different from zero, hence the number c — d ) 2 is also nonzero, 
and so 


a+b V2 _ (a+b V2) (e~d V2) _ ac—2bd bemad 1/5 


ctay? (c+ V3) (c—a V3) = soe + eee 


We again have a number of type (5), and the coefficients remain 
rational. In this example, the number |/ 2 may naturally be repla- 
ced by the square root of any rational number whose square root 
cannot be taken in the field of rational numbers. Thus, the field is 
made up of numbers of the form a + bi with rational a, b 


44. Rings 


In various divisions of mathematics, and also in applications 
of mathematics to science and engineering, one often has to perform 
algebraic operations with a variety of nonnumerical entities. The 
preceding chapters of this book afford numerous examples: the 
multiplication and addition of matrices, the addition of vectors, 
operations involving polynomials, operations on linear transforma- 
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tions. The general definition of an algebraic operation that is sati- 
sfied by the operations of addition and multiplication in number 
rings, and also by operations in the indicated examples, consists 
in the following. 

A set M is given that consists either of numbers or of objects 
of a geometrical nature, or, generally, of certain things which we 
will call elements of the set. We say that an algebraic operation is 
defined on the set M if a law is indicated according to which any 
two of elements a, b of the set are uniquely associated with some 
third element c which also belongs to M. This operation may be 
called addition, then c is termed the sum of the elements a and b and 
is denoted by the symbol c = a + b; the operation may be called 
multiplication, then c is the product of the elements a and b, c = ab; 
finally, it may be that a new terminology and symbolism will be 
introduced for an operation defined on M. 

In each of the number rings are defined two independent opera- 
tions, addition and multiplication. Subtraction and division will 
not be considered new operations since they are the inverses of addi- 
tion and multiplication if we accept the following general defini- 
tion of an inverse operation. 

Let an algebraic operation, say addition, be defined on the set 
M. Then we say that there is an inverse operation called subtraction 
if for any two of elements a, 6 of M there exists in M an element d 
that is unique and that satisfies the equation b + d = a. The ele- 
ment d is then called the difference between the elements a and b 
and is denoted by the symbol d = a — b. 

It is obvious that in number fields, both addition and multipli- 
cation have inverses. True, there is one restriction relative to multi- 
plication: the divisor must be different from zero. Now in number 
rings that are not fields (say, in the ring of integers), only addition 
has an inverse operation. 

On the other hand, in the system of all polynomials in the un- 
known z, whose coefficients belong to a fixed number field P, there 
are also defined two operations: addition and multiplication, addi- 
tion having the inverse operation of subtraction. 

As we know, both in number rings and in the system of polyno- 
mials, the operations of addition and multiplication have the follo- 
wing properties (a, b, c are arbitrary numbers in the number ring 
head) consideration or are arbitrary polynomials in the system at 

and): 
I. Addition is commutative: a+ b = b + a. 
II. Addition is associative: a + (b + c) = (a +- b) + c. 

III. Multiplication is commutative: ab = ba. 

IV. Multiplication is associative: a (bc) = (ab) c. 

V. Multiplication is distributive over addition: 


(a + b) c = ac + be. 
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We are now prepared for a general definition of the concept of 
a ring, one of the most important concepts of algebra. 

A set R is called a ring if on it are defined two operations: addi- 
tion and multiplication, both commutative and associative and also 
related by the distributive law, addition having the inverse opera- 
tion of subtraction. 

We thus have the following examples of rings: number rings, 
rings of polynomials in the unknown zx with coefficients from the 
given number field or even from the given number ring. Let us take 
one more example which illustrates the breadth of the ring 
concept. 

The course of mathematical analysis begins with a definition 
of a function of a real variable x. Let us consider the collection of 
functions that are defined for all real values of z and that take on 
real values; let us define algebraic operations in this collection as 
follows: the sum of two functions f (x) and g (x) is a function whose 
value for any z = z is equal to the sum of the values of the given 
functions, that is, it is equal to f (x) + g (xo). The product of these 
functions is a function whose value for every x = zo is equal to the 
product f (xo)-g (xo). For any two functions of the collection at hand, 
there obviously exists a sum and a product. The truth of Proper- 
ties I to V is verified without any difficulty. The addition and multi- 
plication of functions reduce to the addition and multiplication 
of their values for any x, which is to say, they reduce to operations 
on real numbers, for which the Properties I to V hold. Finally, ta- 
king for the difference of the functions f (z) and g (x) a function whose 
value for any z = zo is equal to the difference f (29) — g (xo), we 
arrive at the operation of subtraction, the inverse of addition. This 
proves that the collection of functions. defined for all real x becomes 
a ring as soon as we introduce (as indicated above) the operations of 
addition and multiplication. 

Other examples of rings of functions may be obtained by conside- 
ring otherwise defined functions, while preserving the definitions 
of operations on functions given above: functions defined, say, only 
for positive values of the unknown z, or functions defined for values 
of z over the interval [0, 1]. Generally, a system of all the functions 
having some given domain of definition is a ring. We could also obtain 
rings by regarding not all the functions defined in a given domain, 
but only the continuous functions studied in the course of mathema- 
tical analysis. On the other hand, we could consider the complex 
functions of a complex variable. Generally speaking, there are very 
many different function rings, just as there are a great diversity of 
number rings. 

Let us now establish some of the more elementary properties 
of rings which follow directly from the definition of a ring. For 
numbers, these properties are quite ordinary, but the reader will 
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possibly be surprised to find that they are consequences only of the 
Conditions I to V and the existence of unique subtraction. 

First a few remarks regarding the significance of Conditions I 
to V. The role of the commutative laws is evident enough. The signi- 
ficance of the associative laws consists in the following: the defini- 
tion of an algebraic operation speaks of the sum or product of only 
two elements. If we attempt to define the product of, say, three 
elements a, b, c, then we have the difficulty that the products au 
and vc, where be = u, ab = v, may, generally speaking, not coin- 
cide, that is, a (bc) = (ab)c. The associative law demands that 
these products be equal to one and the same element of the ring: 
it is natural to take this element for the product abc, written without 
brackets. What is more, the associative law permits de fining uniquely 
the product (sum) of any finite number of elements of the ring; that 
is, it permits proving that a product of any n elements is independent 
of the original arrangement of parentheses. 

Let us prove this assertion by means of induction with respect 
to the number n. It has already been proved for n = 3, and so let 
us assume nm >> 3 and also that for all numbers less than n our asser- 
tion has already been proved. Let there be elements a1, ay, . . ., Qn 
and let there be some kind of arrangement of parentheses in this 
system indicating the order in which multiplication is to be perfor- 
med. The last step will be the multiplication of the product of the 
first k elements aja,...a, (where 1< k <n — 41) by the pro- 
duct @,41@,45 --. An. Since these products consist of a smaller, 
than n, number of factors and for this reason, by hypothesis, are 
uniquely defined, it remains to prove the following equation for 
any k and l: 


(aya, . ~~. Gy) (An 41@nte <- Gn) = (äg «~~ Q1) (1412142 ~~ - An) 


To do this, it will suffice to consider the case 1 = k + 41. But then, 
setting 


10, ... a, = b, Qytolpt3 -> n =e 
we get, by the associative law, 
b (@n+1¢) = (b,41) ¢ 


Which proves our assertion. 

We can speak, in particular, about the product of n equal ele- 
ments; that is, we can introduce the concept of a power, a”, of the 
element a with positive integral exponent n. It is easy to verify that 
all the ordinary rules for operating with exponents hold true in any 
ring. Analogously, the associative law of addition leads to the 
concept of a multiple, na, of the element a by a positive integral 
coefficient n. 
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The distributive law, that is, the usual rule for removing brackets, 
is the only requirement in the definition of a ring that connects addi- 
tion and multiplication; it is only through this law that the joint 
study of the two indicated operations yields more than could be 
obtained in their separate study. The statement of the distributive 
law involves the sum of only two terms. However, it can readily 
be proved that the equality 


(aj + a +... + a) b = ab + a,b +... + a,b 


holds for any k and that the general rule of multiplication of a sum 
by a sum is true. 

Also, the distributive law holds true in any ring for a difference as 
well. Indeed, by the definition of a difference, the element a — b 
satisfies the equality 


b+ (a—b)=a 


Multiplying both sides of this equation by c and applying the distri- 
butive law to the left member, we get 


be + (a — b) c = ae 


Element (a — b) c is consequently the difference of the elements ac 
and be: 


(a — b) c = ac — be 


Very important properties of rings follow from the existence 
of subtraction. If a is an arbitrary element of a ring R, then the 
difference a — a will be some quite definite element of the ring. 
Its role is similar to that of zero in number rings, but, by definition, 
it may depend on the choice of the element a and therefore we will 
provisionally denote it by 0a. 

We will prove that actually the elements 0, are equal for all a. 
Indeed, if b is some other arbitrary element of a ring R, then by 
adding the element 0, to both sides of the equation 


a-+(b—a)=b 
and using the equation Oa + a = a, we get 
et Ga ery creer 


Thus, 0, = b — b = Qp. 

We have proved that any ring R possesses a uniquely defined ele- 
ment which when added to element a of that ring is a. We call this 
element the zero element of the ring R and we denote it by 0. We 
pes there is no real danger of confusing it with the number zero. 

us, 

a+-0 =a forall ain R 
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To continue, in any ring there exists for any element a uniquely 
defined inverse element —a which satisfies the equation 


a + (—a) = 0 


Namely, this element is the difference 0 — a; the uniqueness follows 
from the uniqueness of subtraction. It is obvious that —(—a) = a. 
The difference b — a of any two elements of a ring may now be 
written as 
b — a = b + (—a) 
Indeed, 
lb + (~a) + a = b + H{—a) +a] =b +0 =b 


For any element a of the ring and for any positive integer n we 


have the equality 
n (—a) = —(na) 


And true enough, grouping the terms we get 
na + n (—a) = n la + (—a)] = n-0 = 0 


We are now in a position to define negative multiples of an ele- 
ment ofa ring: if n > 0, then the equal elements n (—a) and —(na) 
will be denoted by (—7) a. Let us finally agree to use the term zero 
multiple Q-a of any element a for the zero element of the ring under 
consideration. 

We have defined zero solely by means of the operation of addi- 
tion and its inverse, that is to say, without using multiplication. 
However, in the case of numbers, the number zero has a characte- 
ristic and very important property with respect to multiplication 
too. It turns out that this property is possessed by the zero element 
of every ring: in any ring the product of any element by zero is zero. 
The proof rests directly on the distributive law: if a is an arbitrary 
element of a ring R, then no matter what the auxiliary element x 
of this ring, we get 


a-0 = a (£ — q) = az — az = Q 


Using this property of zero, we can prove that in any ring the 
following equality holds for any elements a, b: 


(—a) b 


—ab 
True enough, 
ab + (—a) b = [a + (—a)] b = 0b = 0 


Which implies that the familiar yet somewhat mysterious rule 
for the multiplication of negative numbers, “two negatives make 
a positive”, also follows from the definition of a ring, that is, in 
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any ring we have the equality : 
(—a) (—b) = ab 
Indeed, 
(—a) (—b) = —la (—b)] = —(—ab) = ab 


The reader will not find any difficulty now in proving that in any 
ring all the rules for operating with the multiples of any number hold 
true for the multiples (including negative multiples) of any element. 

Thus, the algebraic operations in an arbitrary ring have many 
of the familiar properties of operations on numbers. However, one 
should not think that every property of addition and multiplica- 
tion of numbers is preserved in any ring. For instance, the multi- 
plication of numbers has a property which is the converse of the 
one considered above: if a product of two numbers is equal to zero, 
then at least one of the factors is zero. This property cannot be 
carried over to all rings. In some rings we can find pairs of nonzero 
elements whose product is equal to zero, that is, a + 0, b = 0, but 
ab = 0; elements a and b with this property are called divisors 
of zero. 

Naturally, among the number rings one cannot find any instances 
of rings with zero divisors. Likewise there are no zero divisors among 
the rings of polynomials with numerical coefficients. However, many 
function rings have zero divisors. First of all, let us note that in any 
function ring a zero is a function equal to zero for all values of the 
variable z. Let us now construct the following functions f (x) and 
g (x) defined for all real values of z: 


f(z) =0 for «<0, fi) =2z for «>90, 
g(z) =x for «<0, g(x) =0 for x>0 


Both functions are nonzero since their values are not equal to zero 
for all values of z, but the product of these functions is zero. 

Not all the requirements I to V that enter into the definition 
of a ring are necessary in equal measure. The development of mathe- 
matics shows that whereas the properties I and II of addition and 
the distributive law V occur in all applications, the inclusion of 
the multiplication properties III and IV in the definition of a ring 
is too confining and narrows the sphere of application of this con- 
cept. Thus, when the set of square matrices of order n with real 
elements is regarded with the operations of addition and multipli- 
cation of matrices, it satisfies all the requirements in the definition 
of a ring, with the exception of the commutative law of multipli- 
cation. Noncommutative multiplications are encountered so often 
and in such important instances that the term “ring” is now usually 
interpreted to mean a noncommutative ring (or, more precisely, a not 
necessarily commutative ring, in the sense of possible noncommuta- 
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tivity of multiplication), and the special type of.ring in which.requi- 
rement III is fulfilled is termed. a commutative ring. 

There has also been much interest recently in rings with nonas- 
sociative multiplication and the general theory of rings under con- 
struction is now a theory of nonassociative (that is to say, not neces- 
sarily associative) rings. An elementary instance of such a ring is 
the set of vectors of three-dimensional Euclidean space under the 
operations of the addition and (taken from the course of analytic 
geometry) the vector multiplication of vectors. 


- 45. Fields 


In the set of number rings, we singled out and gave the name 
number fields to those rings which admit division (except by zero). 
It is natural to do this in the general case as well. First note that 
no ring admits division by zero in virtue of the above-proved property 
of zero under multiplication: to divide an element a by zero means 
to find, in that ring, an element x such that O-z = a, which for 
a= 0 is impossible, since the left-hand side is equal to zero. 

Let us introduce the following definition. 

A ring P is termed a field if it consists of more than zero alone 
and if division can be performed uniquely in all cases except divi- 
sion by zero; that is to say, for any elements a and b in P, b 0, 
there is in P a unique element q which satisfies the equality bq = a. 
The element q is called the quotient of the elements a and b and is 


denoted by the symbol q = + 


Quite naturally, all number fields are instances of fields. A ring 
of polynomials in the unknown x with real coefficients and, gene- 
rally, with coefficients taken from some number field, is not a field. 
The division with a remainder that polynomials have differs of 
course from exact division, which is assumed in the definition of 
a field. On the other hand, it is easy to see that the set of all fractional 
rational functions with real coefficients (see Sec. 25) will be a field 
containing the ring of polynomials, just like the field of rational 
numbers contains the ring of integers. 

We could point to certain other instances of fields within the 
ring of functions, but instead we will examine examples of quite 
a different sort. 

All the number rings, and in general all the rings we have con- 
sidered so far, contain infinitely many elements. There are, however, 


* The uniqueness of division in a field, just like the assumed uniqueness 
of subtraction in the definition of a ring, can actually be proved without any 
Goan by means of the requirements that enter into the definition of a field 
or ring). 
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rings and even fields consisting only of a finite number of elements. 
The simplest examples of finite rings and finite fields which are essen- 
tial objects in the theory of numbers are constructed in the follo- 
wing manner. 

Take any natural number n different from 1. The integers a and b 
are called congruent modulo n, 


a == b (mod n) 


if these numbers yield the same remainder when divided by n, 
that is to say, if their difference is exactly divisible by n. The entire 
ting of integers is separated into m mutually exclusive (noninter- 


secting) classes 
Co, Ci, ee ey Cn- (1) 


of numbers congruent modulo n, the class Cp, k = 0,1, ..., n — 1, 
consists of numbers which yield, upon division by n, the remainder 
k. It turns out that it is possible, in a very natural way, to define 
the addition and multiplication of these classes. 

For this purpose, let us take any (not necessarily distinct) classes 
Cp and C, from the system (4). Adding any number of class C, to 
any number of class C,, we obtain numbers lying in one very defi- 
nite class, namely, in the class C+, if k + l< n, or in the class 
Caatn if k + i> n. This leads to the following definition of the 
addition of classes: 


Cah + Ci = Cri for A+Ii<n, (2) 
Cr + Ci = Chin for k+l >n 


On the other hand, multiplying any number of class C, by any 
number of class C, we get numbers lying in a definite class, namely 
the class C,, where r is the remainder left after dividing the product 
kl by n. We thus have the following definition of the multiplication 
of classes: 


C,-C,; = Cn where kl=ng+r, O<r<n (3) 


The system (1) of classes of integers congruent modulo n is a ring 
with respect to the operations defined by the conditions (2) and (3). 
Indeed, the requirements I-V are readily seen to be valid from the 
definition of a ring, but this validity also follows from the truth of 
these requirements in the ring of integers and from the relationship, 
indicated above, between operations on integers and operations 
on classes. Zero is obviously the class Co consisting of numbers exactly 
divisible by n. The class opposite to Cr, k = 1, 2,..., n — 1, is 
the class C,-,. In the system (4) of classes it is thus possible to 
define subtraction, that is, this system satisfies all the requirements 
of the definition of a ring. Let us agree to denote the resulting ring 
by Zp. 
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If the number n. is a composite number; then the ring Z, possesses 
zero divisors and therefore, as will be shown below, it cannot be a 
field. Indeed, if n = kl where 1<k<in, 1<(l <n, then the classes 
C, and C; are different from the zero class Co, but by the definition 
of the multiplication of classes [see (3)], C.-C; = Co. 

But if the number n is prime, then the ring Z, is a field. 

To see this, let there be classes C, and Cm, Cy 54 Co, Le., 1 < 
< k < n — 1. We have to show that it is possible to divide Cm by 
Cr, or to find a class C; such that C,-C; = Cm. If Cm = Co, then 
Cı = Cy, as well. But if Cm =£ Co, then we consider the set of numbers 


k, 2k, 3k, ..., (n—1)k (4) 


All these numbers lie outside the zero class Co, since the product of 
two natural numbers less than a prime n is not divisible by n. Also, 
no two numbers sk and tk from (4), s< ¢, can be in one class, for 
then their difference 


th — sk = (t — s) k 


would be divisible by n , which again is in conflict with the primality 
of the number n. Thus every nonzero class contains exactly one num- 
ber from the set (4). For instance, in the class Cm there is the number 
lk, where 10 l <n — 1, that is, C:-Cp = Cm, and then class C; 
will be the desired quotient resulting from the division of Cm by Cp. 

We have thus obtained an infinity of distinct finite fields: the 
field Z,, consisting of only two elements, and also the fields Z;, Z5, 
Z, Zy and so on. 

Let us examine some properties of fields that follow from the 
existence of division. These properties are similar to those of rings 
based on the existence of subtraction and are demonstrated by the 
same arguments, and so the proof will be left to the reader. 

Every field P has a uniquely defined element whose product by any 
element a of the field is equal to a. This element, which coincides with 


equal quotients = for all nonzero a is called the unity (unit) element 


of the field P and is denoted by 1. Thus, 
a- =a for all a in P 
For every nonzero element a, there is, in every field, a unique inverse 
element a` which satisfies the equality 
a-a! = 


i It is obvious that (a7!)-! = a. The quotient 4 


may now be written in the form 


b 
—=b-at 
a 


namely, a7! = 
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For any element a different from zero and for any positive inte- 
ger n we have the equality 


(a-4)" — (a”) -1 


Denoting these equal elements by a`”, we arrive at negative powers 
of an element of the field for which the ordinary operating rules hold. 
Let us finally agree that a? = 4 for all a. 

The existence of a unit element is not a characteristic property 
of fields: the ring of integers, for instance, has a unit element. Yet 
the example of the ring of even numbers shows that not all rings pos- 
sess a unit element. On the other hand, any ring possessing a unit ele- 
ment and an inverse for every nonzero element is a field. Indeed, in 


this case for the quotient 2 , a=£0, we have the product ba~. It 


is easy to prove the uniqueness of this quotient. 

Notice that no field has zero divisors. Let ab = 0, but a = 0. 
Multiplying both sides of the equality by the element a~t, we get 
(ata) b = 1-b = b on the left and a-!-0 = O on the right, or b = 0. 
From this it follows that in any field any equality may be divided by a 
common nonzero factor. This is so, since if ac = be and c 0, then 
(a — b) ce = 0, whence a — b = 0, or a =b. 


From the definition of the quotient + (where b = 0) and from 


the above-proved possibility of writing it as the product ab-!, it is 
easy to see that all the ordinary rules for handling fractions hold true 
in any field, namely: 





$=% if and only if ad=bc, 
a c ad +- be 
sE ta 
a c ac 
ba ba’ 
—a a 
ae TTT 


The characteristic of a field. Not all properties of number fields 
hold true in the case of arbitrary fields. Say, if we take 1 and add 4 
to it several times, that is, if we take any positive integral multiple 
of one, we will never get zero, and, generally, all these multiples 
(that is, all natural numbers) are distinct. But if we take integral 
multiples of unity in some finite field, then there will invariably be 
equal integral multiples, since the field has only a finite number of 
distinct elements. If all the integral multiples of unity of a field P 
are distinct elements of P, that is, k-1 54 1-1 for k = l then we say 
that the field P has characteristic zero. Such for example are all the 
number fields. But if there exist integers k and I such that k >l, 
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but in P we have the equality k-41 = l-41, then (k —J)-1 =0, 
i.e., there exists in P a positive multiple of unity which is equal 
to zero. In this case P is called a field of a finite characteristic, namely 
of the characteristic p, if p is the first positive coefficient with which 
the unit element of the field P vanishes. All finite fields are examples 
of fields of a finite characteristic. Incidentally, there also exist infi- 
nite fields having a finite characteristic. 

If a field P has a characteristic p, then the number p is prime. 

Indeed, from the equality p = st, where s < p, t< p, would 
follow the equality (s-4) (t-1) = p-1 = 0, that is to say, since a 
field cannot have zero divisors, then either s-1 = 0 or ¢-1 = 0, 
which, however, runs counter to the definition of a characteristic as 
the least positive coefficient which makes the unit element of the 
field vanish. 

If the characteristic of a field. P is equal to p, then for any element a 
of the field we have the equality pa = 0. But if the characteristic of 
the field P is 0 and a is an element of the field, n an integer, then from 
a= 0 and n 0 it follows that na = 0. 

Indeed, in the first case the element pa (that is, the sum of p 
terms equal to a) can, by factoring out a, be represented as 


pa=a(p-1)=a0=0 


In the second case, from the equality na = 0, that is, a (n-1) = 0, 
we would get n-1 = 0, a=4 0; that is, since the characteristic of 
the field is zero, n = Q. 

Subfields, extensions. Suppose in the field P a portion of the 
elements (some set P’) is itself a field with respect to the operations 
defined in P; that is to say, for any two elements a, b in P’, the 


elements (in the field P) a + b, ab, a — b, and, for b #0, = belong 


to P’ (the laws I to V will of course hold in P’ since they hold in P). 
Then P’ is a subfield of the field P, and P is an extension of the field P’. 
Quite naturally, the zero and unity of P will lie in P’ as well and 
will also serve in P’ as zero and unity. Thus, the field of rational 
numbers is a subfield of the field of real numbers; all number fields 
are subfields of the field of complex numbers, 

Let there be given in the field P a subfield P’ and an element c 
exterior to P’ and suppose we have a minimum subfield P” of P 
which contains both P’ and c. There can only be one such minimum 
subfield, since if P” were one more subfield with these properties, 
then the intersection of subfields P” and P” (i.e., the collection 
of elements common to both subfields) would contain P’ and the 
element c and, together with any two of its elements, it would 
contain their sum (this sum must lie both in P” and in P” , and so 
also in their intersection) and likewise their product, difference and 
quotient; in other words the intersection would itself be a subfield, 
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but this contradicts the minimality of the subfield P”. We will say 
that the field P” is obtained by adjoining an element c to the field P’; 
symbolically, we write P” = P' (o). 

The field P’ (c) naturally contains, besides the element c and 
all the elements of the field P’, also all the elements which are 
derived from them by the operations of addition, multiplication, 
subtraction and division. By way of illustration, recall the extension 
(considered in Sec. 43) of the field of rational numbers consisting 
of numbers of the form a + b V 2 with rational a, b; this extension 


results from adjoining the number J) 2 to the field of rational num- 
bers. 


46. Isomorphisms of Rings (Fields). 
The Uniqueness of the Field of Complex Numbers 


The concept of an isomorphism plays an important role in the 
theory of rings. Namely, the rings L and L’ are called isomorphic if 
a one-to-one correspondence can be set up between them such that 
for any elements a, b in L and for the corresponding elements a’, b’ 
in L’, the sum a + b corresponds to the sum a’ + 0’, and the pro- 
duct ab corresponds to the product a'b’. 

Suppose an isomorphic correspondence exists between the rings 
L and L’. In this correspondence, the zero 0 of L corresponds to the 
zero Q’ of L'. Indeed, suppose the element Qis associated with an ele- 
ment c’ of L’. Take an arbitrary element a of L and the associated 
element a’ of L’. Then to the element a + 0 there has to correspond 
the element a’ + c'; but a +0 = a, and so a’ +c’ = a’, whence 
c’ = 0’. Furthermore, the element —a is associated with the element 
—a’. Indeed, let the element —a be associated with the element d’. 
Then to the element a + (—a) = 0 there will have to correspond 
the element a’ + d’, that is, a’ + d’ =0’, whence d’ = — a’. This 
implies that to a difference of elements in L there corresponds a diffe- 
rence of the corresponding elements of L’. By similar arguments it 
may be shown that if the ring L has a unit element, then the image 
of this element (i.e., the element corresponding to it in L’ under 
the given isomorphism) will be the unit element of the ring L’, 
and if the element a from L has the inverse a~!, then in L’ the image 
of a! is the inverse element of a’. 

This implies that a ring isomorphic to a field is itself a field. It 
is also easy to see that the property of a ring not to have zero divi- 
sors also holds in an isomorphic correspondence. Generally speaking, 
isomorphic rings can differ as to the nature of their elements, but 
they are identical with respect to their algebraic properties. Any 
theorem which has been proved relative to some ring will hold true 
for all rings isomorphic to that ring, provided that the proof does 
not involve any individual properties of the elements of the ring 
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but only the properties of the operations. For this reason we will 
not consider isomorphic rings or fields to be distinct; for us they will 
simply be different copies of one and the same ring or field. 

Let us apply this concept to the problem of constructing the 
field of complex numbers. The construction, given in Sec. 17, of 
the field of complex numbers was based on the use of points in the 
plane. This is not the only possible construction. In place of points, 
we could have taken line segments (vectors) in the plane that emanate 
from the coordinate origin, and by specifying these vectors via their 
components a, b on the coordinate axes, we could have defined addi- 
tion and multiplication of the vectors with the aid of the same formu- 
las (2) and (3) of Sec. 17, as in the case of points in the plane. We 
could have gone further still and dispensed with geometrical mate- 
rial altogether: noting that points in a plane and also vectors in a 
plane can be represented by ordered pairs of real numbers (a, b}, we 
could simply take the collection of all such pairs and introduce 
addition and multiplication via formulas (2) and (3) of that section. 

With respect to their algebraic properties, all these fields would 
be indistinguishable, as witness the following theorem. 

All extensions of the field D of real numbers derived by adjoining 
to D a root of the equation 

zt+i=0 (4) 


are isomorphic among themselves. 

Indeed, suppose we have a field P which is an extension of the 
field D and contains an element satisfying equation (1). The choice 
of denoting this element is up to us, and we use the letter i. We thus 
get the equation i? -+ 1 = 0 (whence i? = — 1), where involution 
and addition are to be understood in the sense of the operations 
defined in the field P. We now want to find the field D (i) obtained 
by adjoining the element i to the field D, that is, we wish to find the 
minimal subfield of the field P containing both D and the element i. 

For this purpose, let us examine all the elements a of the field 
P which can be written in the form 


a =a + di (2) 


where a and b are arbitrary real numbers, and the product of the 
number b by element i and the sum of the number a and this pro- 
duct are to be understood in the sense of the operations defined in 
the field P. No element a of P can possess two different representa- 
tions of that form: from 


a =a +bi=a + bi 
and b =£6 there would follow 
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That is, i would be a real number, but if b = b, then a = a. In 
particular, the elements of P written as (2) include all real numbers 
(the case b = 0) and also the element i (the case a = 0, b = 1). 

We will now show that the collection of all elements of type (2) 
constitutes a subfield of the field P. This will then be the desired 
field D (i). Suppose we have the elements a = a + bi and B = 
= c + di. Then, using the commutativity and associativity of addi- 
tion and the distributive law, all of which hold in P, we get 


œ -+ B = (a + bi) + (c + di) = (a + c) + (bi + di) 
whence 
a + R = (a + ce)+(b + d)i (3) 
Thus, this sum again belongs to the set of elements under considera- 
tion. Furthermore, 


—B = (~c) + (—a i 
since, by (3), the equality B + (—ß) = 0 + Oi = 0 holds true. 
Therefore 


a — B =a + (—B) = (a — c) + (b — d) i (3°) 
That is to say, this set is also closed under subtraction. Again using 
properties from I to V, which hold for operations in the feld P 
(see Sec. 44), and relying on the equality i? = — 1, we get 
ap = (a + bi) (c + di) = ac + adi + bei + bdi? 
that is, 
aß = (ae — bd) + (ad + be) i (4) 


Thus the product of any two elements of the type (2) is again an ele- 
ment of this type. Finally, suppose that B +0, i.e., at least one of 
the numbers c, d is nonzero. Then we will also have c — di 0 and 


(c + di) (c — di) = c? — (di)? = c? — @i? = c? + d? 


and c? -+ d? Æ 0. Therefore, using the assertion (stated in the pre- 
ceding section) that all the ordinary rules of handling fractions 
hold true in any field, and thus, in particular, that a fraction remains 
unchanged when the numerator and denominator are multiplied 
by the same nonzero element, we obtain 


a a+bi (a+ bi) (c—di) _ (ae-+- bd) + (be—aad) i 


B c-+-di — (c-+di) (c— di) c24- d3 


That is to say, the element 
a ac+bd bc — ad.. 


Bo apa T apai (4) 


again has the form (2). 
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We will now show that the subfield D (i) which we have derived 
from the field P is isomorphic to that field of points in a plane that was 
constructed in Sec. 17. Associating with the element a + bi of the 
field D (i) a point (a, b), we obtain [due to the uniqueness—just 
proved—of the notation (2) for elements of the field D (i)] a one-to- 
one correspondence between the elements of this field and all the 
points in the plane. In this correspondence, the real number a is 
associated with the point (a, 0) because of the equality a = a + Oi, 
and the element i = 0 + 4-i is associated with the point (0, 1). On 
the other hand, comparing formulas (3) and (4) of this section with 
formulas (2) and (3) of Sec. 17, we find that the sum and product of 
the elements « and f of the field D (i) are correlated with the points 
which are the sum and, respectively, the product of points associa- 
ted with the elements a and $. 

This completes the proof of the theorem, since all fields that are 
isomorphic to some given field are isomorphi¢ among themselves, 
For one thing, we see that the choice (in Sec. 17) of formulas (2) and 
(3) for determining operations involving points was not accidental 
and cannot be altered. 

There are many other ways of constructing the field of complex 
numbers. Let us examine one which uses the addition and multi- 
plication of matrices. 

We consider a noncommutative ring of second-order matrices 
over the field of real numbers. It is obvious that the scalar matrices 


G 0 
Oa 
constitute in this ring a subfield that is isomorphic to the field of 
real numbers. It turns out, however, that in the ring of second-order 
matrices over the field of reals, we can also find a subfield that is isomor- 


phic to the field of complex numbers. Indeed, associate with every 
complex number a + bi the matriz 


a 

—b a 

In this way, the entire field of complex numbers is mapped one-to-one 
onto a part of the ring of second-order matrices, and from the equa- 


tions 
ab cd a+c b+d\ 
© Pe JF aie)’ 
a b cd ac—bd ad+- be 
(; Cg A ete ao) 


18* 


276 CH. 10. FIELDS AND POLYNOMIALS 





it follows that this mapping is isomorphic, since the matrices in 
the right-hand members correspond to the complex numbers 
(a +c) + (b+ di = (a + di) + (c + di) and (ac — bd) + (ad + 
+ be) i = (a + bi) (c + di). In particular, the role of the imagi- 
nary unit i is played by the matrix 


( 01 
—10 
The foregoing result indicates yet another possible way of con- 


structing the field of complex numbers that is just as satisfactory 
as those considered earlier. 


47. Linear Algebra and the Algebra of Polynomials 
over an Arbitrary Field 


In the earlier chapters of this book devoted to linear algebra, the 
base field was the field of real numbers. It is easy to verify, however, 
that much of what was written in those chapters can be carried over 
word for word to the case of an arbitrary base field. 

Thus, for an arbitrary base eld P, the Gaussian method for solving 
systems of linear equations, the theory of determinants and Cramer's 
rule, which were given in Chapter 1, all hold true. It is only the remark 
concerning skew-symmetric determinants (at the end of Sec. 4) 
which requires the assumption that the characteristic of the field 
P is different from two. Incidentally, the proof of Property 4 (same 
section) also breaks down if the characteristic of the field P is equal 
to two, though the property itself holds true. 

It is also useful to note that the assertion (mentioned repeatedly 
in Chapter 1) on the existence of an infinity of distinct solutions to 
an indeterminate system of linear equations holds true in the case 
of any infinite base field P, but ceases to hold if P is finite. 

The following carry over completely to the case of an arbitrary 
base field: the theory of linear dependence of vectors, the theory of the 
rank of a matrix and the general theory of systems of linear equations 
(see Chapter 2), and also the algebra of matrices (Chapter 3). 

The general theory of quadratic forms constructed in Sec. 26 is 
carried over to the case of any base field P whose characteristic is different 
from two. As can be readily demonstrated, the fundamental theorem 
of this section ceases to hold without this restriction. 

For example, let P = Z,, that is, let P be a field consisting of 
two elements 0 and 4; let 4 4- 1 = 0, whence —1 = 1, and let there 
be a quadratic form f = qzx, over this field. If there exists a linear 
transformation 

zi = buyi + Drea 


£a = bayi + boya 
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which reduces f to canonical form, then in the equation 


f = (byi + biya) (bayi + ba2Y2) 
= bubuay? + (birba + biba) YY + bizboyz 


the coefficient bibs + bigba1 of the product y,y, must be equal to 
zero. But this coefficient is equal to the determinant of the linear 
transformation that we took, since irrespective of whether b,.b,, = 1 
or bizb21 = 0, we have bizba = — Oy2b9, in both cases. Our linear 
transformation turned out to be singular. 

The rest of Chapter 6 is largely devoted to quadratic forms with 
complex or real coefficients. 

Finally, the entire theory of linear spaces and their linear trans- 
formations which was constructed in Chapter 7 holds true for the case 
of an arbitrary base field P. Incidentally, the concept of a characteri- 
stic root is connected with the theory of polynomials over an arbi- 
trary field (this will be discussed below). Notice that the theorem, 
in Sec. 33, on the relationship between characteristic roots and 
eigenvalues will now be formulated as follows: the characteristic 
roots of a linear transformation ọ which lie in the base field P, and 
they alone, serve as the eigenvalues of this transformation. 

Now the theory of Euclidean spaces (Chapter 8) is essentially 
connected with the field of real numbers. 

We can also extend to the case of an arbitrary hase field P certain 
of the above-discussed sections of the algebra of polynomials. Howe- 
ver, it is first necessary to make precise the meaning of the concept 
of a polynomial over an arbitrary field. 

In Sec. 20 we indicated two viewpoints concerning the concept 
of a polynomial: the formal-algebraic view and the function-theore- 
tic view. Both can be transferred to the case of an arbitrary base 
field. However, though they are equivalent in the case of number 
fields (see Sec. 24), and, as can readily be verified, of infinite fields 
in general, they cease to be equivalent in the case of finite fields. 

Consider, for instance, the field Z, introduced in Sec. 45 and 
consisting of two elements 0 and 1 with 1 + 1 = 0. The polynomials 
x+ 1 and z?-+1 with coefficients from this field are distinct; 
that is to say, they do not satisfy the algebraic definition of equality 
of polynomials. Yet, for z = 0, both these polynomials become 1, 
and for z = 1 they have the value 0, that is to say, they must be 
considered equal as “functions” of the “variable” z, which takes on 
values in the field Z,. In the field Z3;, consisting of three elements: 
0, 1, 2, with 1 -+ 2 = 0, the situation is the same relative to the 
polynomials 23 + z + 1 and 2z + 1. Examples of this type can, 
generally, be indicated for all finite fields. 

Thus, in the theory of an arbitrary field P, one cannot accept the 
function-theoretic view of polynomials. It consequently becomes 
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necessary to make explicit the formal-algebraic definition of a poly- 
nomial. For this purpose, we will construct a ring of polynomials 
over an arbitrary field P such that dispenses, from the very start, 


with the ordinary notation of polynomials in terms of an “unknown” 
z. 


Consider all possible ordered finite systems of elements of the 
field P having the form 
(4; Qis >. +) n-i» an) (1) 


Here, n is arbitrary, n > 0, but for n >O it must be true that 
a, Æ 0. Defining addition and multiplication for systems of the form 
(1) in accord with formulas (3) and (4), Sec. 20, we convert the col- 
lection of these systems into a commutative ring; the necessary proofs 
of the properties repeat word for word what was accomplished for 
number polynomials in Sec. 20. 

In the ring we have constructed, systems of the form (a) (the 
case n = 0) constitute a subfield isomorphic to the field P. This 
permits identifying such systems with corresponding elements a 
of the field P, that is, setting 


(a) =a for all ain P (2) 


On the other hand, denote the system (0, 1) by the letter z, 
z = (0, 1) 
Then, applying the above-indicated definition of multiplication, we 
find that z? = (0, 0,1) and, generally, 
z*—(0,0, ..., 0, 4) . (3) 
—— 


k times 


Now using the definitions of addition and multiplication of | 
ordered systems, and also equalities (2) and (3), we get 
(ao, Qi, an... Annts Gn) 

= (ao) (0, a4) + (0, 0, a) 
+... + (0,0,...,0, a1) + (0, 0,...,0, @,) 
— ee ee ome 
n—i times n times 
= (ao) + (a1) (0, 1) + (aa) (0, 0, 4) 
Sate E + (ani) (0, 0,..., 9, 1) + (an) (0, 0,..., 0, 1) 
—— ae” — ee 


n—1 times n times 
= ag + aye + ag? +... + a,x” + ana” 
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< Thus, any ordered system of type (4) can be written as a poly- 
nomial in x with coefficients from the field P, and this notation will 
evidently be unique. Finally, starting with the already proved com- 
mutativity of addition, we can go over to the notation in descending 
powers of z. 

Consequently, we construct a commutative ring which it is na- 
tural to call a ring of polynomials in the unknown x over the field P. 
This ring is symbolized as P [z]. — 

The ring P [x] contains the field P itself, as was demonstrated 
above. Now, as in the case of rings of polynomials over number fields 
(see Sec. 20), the ring P [x] has a unit element, does not have zero di- 
visors and is not a field. 

If the field P is contained in a greater field P, then the ring P [x] 
is a subring of the ring P [x]: any polynomial with coefficients from 
P can of course be considered a polynomial over the field P too; 
now the sum and product of polynomials depend solely on their 
coefficients, and for this reason they do not change when passing 
to a larger field. 

To get a still better picture of the true extent of the concept 
of a “ring of polynomials over a field P”, let us examine it from yet 
another angle. 

Let the field P be contained as a subring in some commutative 
ring L. The element a of ring L is called algebraic over the field P 
if there exists an equation of degree n, n > 1, with coefficients from 
the field P that is satisfied by the element a. If there is no such equa- 
tion, then the element « is termed transcendental over the field P. Natu- 
rally, the element x of the ring P [zx] is transcendental over the 
field 

The following theorem holds true. 

If the element a of ring L is transcendental over the field P, then 
the subring L' obtained by adjoining the element « to the field P (i.e., 
the minimal subring of the ring L containing the field P and the 
element a) is isomorphic to the ring P [zx] of polynomials. 

Indeed, any element ĝ of the ring L which can be written as 


B = aga” + aya" +... + anat + ay, n>0 (4) 


with coefficients dy, a, .. ., @n-1, G from the field P will be con- 
tained in the subring L’. The element 6 cannot have two distinct 
notations of the form (4), since by subtracting one from the other we 
would find that there exists an equation over the field P satisfied 
by the element a, but this is in conflict with the transcendental 
nature of this element. Combining the elements of type (4) by the 
rules of addition in the ring L, it is of course possible to combine 
coefficients of like powers of a; but this coincides with the rule for 
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adding polynomials. On the other hand, by multiplying elements 
of form (4) b} the rules for multiplication in the ring L, we can, 
using the dis ibutive law, perform termwise multiplication and 
then collect like terms. This evidently leads to the familiar law of 
multiplication of polynomials. This proves that elements of the 
type (4) constitute, in the ring Z, a subring containing the field P 
and the element æ (that is, a subring coinciding with L’), and that 
this subring is isomorphic to the polynomial ring P [z]. 

We see that the choice of definitions for operations on polyno- 
mials we made above was not accidental; it is fully determined by 
the fact that the element x of the ring P [z] must be transcendental 
over the field P. 

Note that in constructing the polynomial ring P [z] we never 
used the division of elements of the field P and only once (namely, 
in proving the assertion on the degree of a product of polynomials) 
had to refer to the absence of zero divisors in the field P. It is there- 
fore possible to take an arbitrary commutative ring L and, repea- 
ting the foregoing construction, derive a polynomial ring L [z] over 
the ring L; if in this case the ring L does not contain divisors of zero, 
the power of the product of the polynomials will be equal to the sum 
of the powers of the factors and therefore the polynomial ring L [z] 
will not contain divisors of zero either. 

Returning to polynomials with coefficients from an arbitrary 
field P, notice that actually the entire theory of divisibility of 
polynomials (described in Secs. 20-22 of this book) is carried over 
to this case. Namely, in the ring P [x] we have the division algorithm, 
and both the quotient and the remainder will themselves belong 
to the ring P [z]. Also, the concept of a divisor is meaningful in the 
ring P [x] and all its basic properties are preserved. The fact that the 
division algorithm does not take us outside the base field P, permits 
us to assert that the property of a polynomial q (zx) to be a divisor of 
f (x) does not depend upon whether we consider the field P or any exten- 
sion of it. 

Also preserved in the ring P [zx] are the definition and all the proper- 
ties of a greatest common divisor, together with the Euclidean algorithm 
and the theorem proved in Sec. 21 with the aid of this algorithm. Notice 
that since the division algorithm is, as we know, independent of the 
choice of the base field, we can assert that the greatest common divi- 
sor of two given polynomials is likewise independent of whether we con- 
Sider the field P or an arbitrary extension of it, P. 

Finally, for polynomials over the field P, the concept of a root is 
meaningful and the basic properties of roots hold true. Likewise pre- 
served is the theory of multiple roots. Incidentally, we will return 
to this question at the end of the next section. 

These remarks will enable us, in our subsequent study of poly- 
nomials over any field P, to refer to Secs. 20-22. 
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48. Factorization of Polynomials 
into Irreducible Factors 


On the basis of the theorem on the existence of a root, we 
proved in Sec. 24 the existence and uniqueness of factorization of a 
polynomial into irreducible factors for fields of complex and real 
numbers. These results are particular cases of general theorems 
referring to polynomials over an arbitrary field P. The present sec- 
tion is devoted to this general theory, which parallels the theory 
of the prime factorization of integers. 

First let us define those polynomials which play the same role 
in the polynomial ring as primes play in the ring of integers. We 
stress from the start that in this definition we deal solely with poly- 
nomials whose degree is greater than or equal to unity. This is in 
full accord with the fact that in the definition of prime numbers and 
in the study of the factorization of integers into prime factors, the 
numbers 1 and —1 are ruled out. 

Suppose we have a polynomial f (x) of degree n, n > 1, with 
coefficients from the field P. By Property V, Sec. 24, all polynomials | 
of zero degree are divisors of f (x). On the other hand, by Property 
VII, all polynomials cf (x), where c is a nonzero element of P, will 
also be divisors of f (x); note that these polynomials exhaust all 
the divisors (with degree n) of the polynomial f (x). As to divisors 
(of f (x)) whose degree is greater than 0 but less than n, it w'll be seen 
that they may or may not be in the ring P [z]. In the former case, 
the polynomial f (x) is called reducible in the field P (or over the 
field P), in the latter case, irreducible over this field. 

Recalling the definition of a divisor, we may say that a polyno- 
mial f (x) of degree n is reducible over the field P if it can be factored 
over this field (i. e., in the ring P [x]) into a product of two factors 
of degree less than n: 


f (2) = 9 (2) 4 (2) (1) 


and f (x) is irreducible over the field P if in any factorization of the 
type (1), one of its factors is of degree 0 and the other is of degree n. 
-= Note particularly that one can speak of reducibility or irredu- 
cibility of a polynomial only as regards a given field P, since a poly- 
nomial that is irreducible over one field may prove to be reducible 
over some extension P of that field. Thus, the polynomial z? — 2 
with integral coefficients is irreducible over the field of rational 
numbers: it cannot be factored into a product of two linear factors 
with rational coefficients. However, this polynomial is reducible 
over the field of real numbers, as the following equation shows: - 


z? — 2 = (z — V3) (z + V2) 
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The polynomial z? + 1 is irreducible not only over the field of ratio- 
nal numbers but also over the field of real numbers. It becomes redu- 
cible however in the field of complex numbers, since 


z? + 1 = (z — i) (z + i) 


Let us point to certain basic properties of irreducible polyno- 
mials, bearing in mind that we will be speaking of polynomials 
irreducible over the field P. 

(a) Any polynomial of degree one is irreducible. 

This is rather evident since if the polynomial could be factored 
into a product of factors of lower degree, then they would have to 
be of degree 0. But the product of any polynomials of zero degree 
is again a polynomial of zero degree and not first degree. 

(b) If a polynomial p (z) is irreducible, then any polynomial cp (x), 
where c is a nonzero element of P, is also irreducible. 

This property follows from Properties I and VII of Sec. 24. It 
will permit us, where necessary, to confine our consideration to 
irreducible polynomials whose leading coefficients are unity. 

(c) If f (x) is an arbitrary polynomial and p (z) is an irreducible 
polynomial, then either f (x) is divisible by p (x) or the polynomials 
are coprime (relatively prime). 

If (f (x), p (x))=d (a), then d (x), being a divisor of the irreducible 
polynomial p (x) is either of degree 0 or is a polynomial of the form 
cp (x), c = 0. In the former case, f (x) and p (x) are coprime, in the 
latter, f (z) is divisible by p (z). 

(d) If the product of the polynomials f (x) and g (x) is divisible by an 
irreducible polynomial p (x), then at least one of these polynomials 
is divisible by p (z). 

Indeed, if f (xz) is not divisible by p (z), then, by (c), f (z) and 
p (x) are coprime, and then, by Property (b) of Sec. 24, the poly- 
nomial g (x) must be divisible by p (z). 

Property (d) is readily carried over to the case of a product of any 
finite number of factors. 

The two theorems which follow are the main purpose of this 
whole section. 

Any polynomial f (x) in the ring P [x] having degree n, n > 1, 
can be factored into a product of irreducible factors. 

Indeed, if a polynomial f (x) is itself irreducible, then the indi- 
cated product consists of only one polynomial. But if it is reducible, 
then it can be factored into a product of factors of lower degree. If, 
among these factors, we again find irreducibles, then we decompose 
them into factors again, etc. This process will cease after a finite 
number of steps, since in any factorization of f (x) into factors, the 
sum of the degrees of the factors must be equal to n and therefore 
the number of factors dependent on x cannot exceed n. 


48. FACTORIZATION OF POLYNOMIALS INTO IRREDUCIBLE FACTORS 283 





The factorization of integers into prime factors is unique if we 
confine our consideration to positive integers. However, in the 
ring of all integers, uniqueness only occurs to within sign: thus, 
—6 = 2-(—3) = (—2)-3, 10 = 2-5 = (—2)-(—5) and so on. 
A similar situation obtains in the polynomial ring as well. If 


f (x) = pı (2) pa (2) . . . Pe (2) 


is a factorization of the polynomial f (x) into a product of irreducible 
factors and if the elements ci, Ca, . . . cs from the field P are such 
that their product is equal to 4, then 


f (x) = leaps (z)] leaps (£)] . . - [caps (2)] 


will also, by (b), be a factorization of f (z) into a product of irre- 
ducible factors. It turns out that this exhausts all factorizations 
of f (x). 

If a polynomial f (x) from a ring P [zx] can be decomposed in two 
ways into a product of irreducible factors; 


f (x) = pa (2) Pa (2). - | Ps (2) = ga (2) ga (3). gi) (2) 
then, s = t, and, with appropriate numbering, we have the equalities 
qi (2) = api (z), i=, 2,...,8 (3) 


where c; are nonzero elements from the field P. 

This theorem holds for polynomials of degree one, since they 
are irreducible. We will therefore argue by induction with respect 
to the degree of the polynomial, that is, we will prove the theorem 
for f (x), assuming that for polynomials of lower degree it is already 
proved. i 

Since q; (z) is a divisor of f (z), it follows, by Property (d) and 
equality (2), that q, (x) will be a divisor of at least one of the poly- 
nomials p; (z), say of p, (z). However, since the polynomial p; (z) 
is irreducible and the degree of q; (x) is greater than zero, there exists 
an element c, such that 


qı (2) = cups (2) (4) 


Substituting this expression of q, (x) into (2) and cancelling p, (z) 
(which is permissible since there are no zero divisors in the ring 
P [z]), we obtain the equation 


Pa (2) ps (x). « - Ds (x) = leads (2)] qa (2) . « . qe (2) 


Since the degree of the polynomial equal to these products is lower 
than that of f (z), then it is already proved that s — 1 =ż— 1, 


whence s = £, and there exist elements c}, C3, ..., Cs such that 
cips (x) = eqs (z), whence q (2) = (¢;'¢4) pa (z) and cipi (2) = 
= q; (x), i = 3, ..., s. Assuming c'c, = C, and taking into ac- 


count (4), we get the equations (3) completely. 
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The theorem we have just proved may be stated more succinctly: 
every polynomial may be uniquely decomposed into irreducible factors 
to within zero-degree factors. 

Incidentally, it is always possible to consider the following 
special type of factorization which will be quite unique for every poly- 
nomial: take any factorization of the polynomial f (z) into irreducible 
factors and factor out of each the leading coefficient. We get the 
factorization 


f (x) = aop: (2) Pa (2). . . Ds (2) (5) 


where all the p; (z), i = 1, 2, ..., s, are irreducible polynomials 
with leading coefficients equal to unity. The factor ay will be equal 
to the leading coefficient of the polynomial f (x), as can readily be 
verified by multiplying out the right member of (5). 

The irreducible factors in (5) do not necessarily have to be di- 
stinct. If an irreducible polynomial p (z) appears several times in 
the factorization (5), it is called a multiple factor of f (x), namely 
a k-fold (double, triple, etc.) factor if (5) contains exactly k factors 
equal to p (zx). But if the factor p (z) appears in (5) only once, then 
it is called a simple (or single) factor of f (x). 

If in the factorization (5) the factors p: (z), pa (z), - . +. pı (2) 
are distinct and any other factor is equal to one of them and if p; (2), 


i =1, 2,..., 1, isa k,-fold factor of the polynomial f (z), then 
(5) may be rewritten as 
f (x) = apt (x) p% (2)... pi! (2) (6) 


This is the notation that we will ordinarily make use of without spe- 
cifying that the exponents are equal to the multiplicities of the 
corresponding factors, i.e, that p; (z) Æ p; (x) for i = j. 

If we are given the factorizations of the polynomials f (x) and g (x) 
into irreducible factors, then the greatest common divisor d (x) of these 
polynomials is equal to the product of the factors appearing in both 
factorizations at the same time, and each factor is taken to the power 
equal to the least of its multiplicities in the two given polynomials. 

Indeed, the indicated product will be a divisor of each of the 
polynomials f (z), g (z) and therefore also of d (z). If this product 
were different from d (z), then the factorization of d (x) into irredu- 
cible factors would either contain a factor that does not appear in 
the factorization of at least one of the polynomials f (z) and g (2), 
which is impossible, or one of the factors would have a higher power 
than it has in the factorization of one of the polynomials f (x) and 
g (z), which is again impossible. 

This theorem is similar to the rule ordinarily used to find the 
greatest common divisor of integers. However, in the case of poly- 
nomials, it cannot replace the Euclidean algorithm, for, since there 
is only a finite number of primes less than a given positive integer, 
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the factorization of an integer into prime factors is attained by 
a finite number of trials. This is not the case in a polynomial ring 
over an infinite base field, and, in the general case, one cannot 
offer a method for factoring polynomials into irreducible factors. 
What is more, it is very hard even to decide in the general case the 
question of whether a polynomial f (x) is irreducible over a given 
field P. Thus, the description of all irreducible polynomials for the 
case of the fields of complex and real numbers was obtained in Sec. 24 
as a corollary to a very profound theorem on the existence of a root. 
As to the field of rational numbers, only a few assertions of a spe- 
cific nature concerning polynomials that are irreducible over this 
field will be made in Sec. 56. 

We have shown that in the polynomial ring (as in the ring of 
integers) we have a factorization into “prime” (irreducible) factors 
and that this factorization is in a certain sense unique. The question 
arises as to whether it is possible to carry over these results to broader 
classes of rings. We confine ourselves here to the case of such commu- 
tative rings as have a unit element and do not have divisors of zero. 

We will use the term divisor of unity for an element a of the ring 
such that in this ring there exists an inverse element a7: 


aa-t=1 


In the ring of integers, these are the numbers 1 and —1, in the ring 
P [zx] of polynomials, all the polynomials of zero degree (that is, 
nonzero numbers from the field P). The element c, which is nonzero 
and is not a divisor of unity, will be called a prime element of the 
ting if in any decomposition of it into a product of two factors, 
c = ab, one of the factors is invariably a divisor of unity. In the 
ring of integers, the prime elements are prime numbers, in the poly- 
nomial ring they are irreducible polynomials. 

Will every element of the ring under consideration that is non- 
zero and is not a divisor of unity be decomposable into a product 
of prime factors? If it is, will the factorization be unique? This is 
to be understood as follows: if 


a = PiP: +--+ Pr = 4192 -++ qı 
are two factorizations of the element a into prime factors, then 
k =l and (possibly after a change in the numbering) 
qi = Pili, a rae /7 
where c; is a divisor of unity. 
It turns out that in both instances the answer is no. We give 
one example, namely, we indicate a ring in which factorization 


into prime factors is possible but not unique. 
Consider complex numbers of the form 


a =a +bV 3 (7) 
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where a and b are integers. All such numbers form a ring without 
divisors of zero and containing a unit element; indeed, 


(a+b V —3) (c +4 V —3) = (ac—3bd)+ (be+ad)V —3 (8) 
We use the term norm of a number a = a + b V—3 for the positive 
integer 
N (a) = a? + 3b? 
By (8), the norm of a product is equal to the product of the norms 
N (ap) = N (a) N (P) (9) 
Indeed, 
(ac — 3bd)? + 3 (bc + ad)? = a?c? + 9b7d® + 3b7c? + 3a°d? 
= (a? + 3b?) (c? + 3d?) 
If in our ring the number g is a divisor of unity, that is the num- 
ber a! is also of the form (7), then, by (9), 
N (a)-N (a) = N (ea) = NA) =1 


and therefore N (a) = 1, since the numbers N (a) and N (a7!) are 


integers and are positive. Ifa = a + b V —3, then from N (a) = 1 
it follows that 


N (a) =a? + 30? =1 


which, however, is possible only when b=0, a = + 1. Thus, in 
our ring, as in the ring of integers, only the numbers 1 and —1 will 
be divisors of unity, and only these numbers have a norm equal to unity. 

The equation (9) for the norm of a product can naturally be 
extended to the case of any finite number of factors. It is thus easy 
to conclude that any number œ in our ring can be factored into a pro- 
duct of a finite number of prime factors. We leave the proof to the reader. 

However, we cannot assert that the factorization into prime factors 
is unique. For example, the following equations hold true: 


= 2.2 = (t +V—3) (1 —V=3) 


In our ring there are no other divisors of unity except 1 and —1, 
and so the number 1 + V —3 (like the number 1 — V —3) cannot 
differ from the number 2 solely by a factor which is a divisor of unity. 
It remains to show that each one of the numbers 2, 1 + V —3, 1 — 
— V —3 will be prime in the ring under consideration. Indeed, the 
norm of each of these three numbers is equal to 4. Let a be any one 
of these numbers and let 
a = By 
Then, by (9), one of the following three cases is possible: 
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(1) N (B) = 4, N (y) = 1; (2) N (B) = 1, N (y) = 4; (3) N (B) = 
= N (y) = 2. In the first case, the number y will, as we know, be 
a divisor of unity; in the second case, B will be a divisor of unity. 
The third case is impossible due to the impossibility of the equality 


a? + 3b? = 2 


where a and b are integers. 

Multiple factors. Although, as has been demonstrated above, 
we are not able to decompose polynomials into irreducible factors, 
there exist methods which enable us to determine whether a given 
polynomial has multiple factors or not and, if it does, to reduce 
the study of that polynomial to the study of polynomials that do not 
contain multiple factors. True, these methods require that we impose 
certain restrictions on the base field. In the rest of this section we 
will assume that the field P has characteristic 0. Without this 
restriction, the theorems on multiple factors that will be proved 
below break down. At the same time, the case of fields of characte- 
ristic zero is the most important one from the viewpoint of appli- 
cations since, for one thing, all number fields are included here. 

To begin with, notice that we can extend to this case both the 
concept of a derivative of a polynomial (introduced in Sec. 22 for 
polynomials with complex coefficients) and the basic properties 
of this concept.* Let us now prove the following theorem. 

-If p (x) is a k-fold irreducible factor of the polynomial f (x), k > 1, 
then it will be the (k — 1)-fold factor of the derivative of this poly- 
nomial. In particular, a prime factor of the polynomial does not enter 
into the factorization of the derivative. 


Indeed, let 

f (x) = p* (x) g (@) (10) 
g (x) is no longer divisible by p (z). Differentiating (10), we get 

f (2) = p* (2) g' (2) + kp™™ (x) p' (2) g (2) 

= p™ (z) Ip (z) g' (z) + Ap’ (2) g @)I 
The second term in'the brackets is not divisible by p (x); indeed, 
g (x) is not divisible by p (x) by hypothesis, p’ (x) is of lower degree, 
i.e., it is not divisible by p (x) either; hence, due to the irreducibi- 
lity of the polynomial p (xz) and Property (d) of this section and 
Property IX of Sec. 21, our assertion follows. On the other hand, 
the first term in the sum in the square brackets is divisible by p (z) 
and so the entire sum cannot be divisible by p (x); which is to say 
that the factor p (x) does indeed appear in f’ (x) with a multiplicity 

of k — 1. 


* For fields of a finite characteristic, the assertion that the derivative of 
a polynomial of degree n is of degree n—1 fails. 
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From our theorem and from the above-indicated method of fin- 
ding the greatest common divisor of two polynomials it follows that 
if a factorization of the polynomial f (x) into irreducible factors is 
given, 

f (x) = aop% (2) pha (x)... py! (2) (11) 


then the greatest common divisor of f (x) and of its derivative has the 
following factorization into irreducible factors: 


(f (2), F (£) = pt! (x) ptet (x)... pi (a) (12) 


where the factor pit * (2) should naturally be replaced by unity 
for k; = 1. In particular, a polynomial f (x) does not contain multi- 
ple factors if and only if it is relatively prime to its derivative. 

We now know how to answer the question of the existence of 
multiple factors in a given polynomial. What is more, since neither 
the derivative of a polynomial nor the greatest common divisor of 
two polynomials depend on whether we are considering the field 
P or any extension P of it, we obtain the following corollary to 
the result that has just been proved. 

If a polynomial f (x) with coefficients in a field P of characteristic 
zero does not have multiple factors over this field, then neither will 
there be any multiple factors over any extension P of the field P. 

In particular, if f (x) is irreducible over P and P is some exten- 
sion of P, then, although f (x) can be reducible over P, it will defi- 
nitely not be divisible by the square of an irreducible (over P) 
polynomial. 

Isolating multiple factors. If we have a polynomial f (z) with 
the factorization (11) and if by d, (x) we denote the greatest com- 
mon divisor of f (x) and of its derivative f’ (x), then (42) will be a 
factorization of d, (x). Dividing (11) by (12), we get 

v, (0) = LEY = up: (2) pa (2) -< pi (2) 

That is, we obtain a polynomial without multiple factors, and any 
irreducible factor of v; (z) will also be a factor of f (x). In this way, 
finding the irreducible factors of f (z) is reduced to finding them for 
the polynomial v; (x) which, generally speaking, is of lower degree 
and, at any rate, contains only prime factors. If the problem is 
solved for v, (z), then it only remains to determine the multiplicity 
of the irreducible factors found in f (x); this is done by means of the 
division algorithm. : 

A more sophisticated variant of this method enables us to con- 
sider several polynomials without multiple factors; also, having 
found the irreducible factors of these polynomials, we not only 
find all the irreducible factors of f (z), but also their multiplicities. 
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Let (41) be a factorization of f (x) into irreducible factors, the 
greatest multiplicity of the factors being s, s > 1. Denote by F (x) 
the product of all single factors of f (x), by F, (x) the product of all 
double factors, but taken only once at a time, and so forth; finally, 
denote by F, (x) the product of all s-fold factors taken once at a 
time, as before. If under these conditions, for some j in f (z), there 
are no j-fold factors, set F; (xz) = 1. Then f (z) will be divisible by 
the Ath degree of the polynomial F, (z), k = 1, 2,..., s, and 
the factorization (11) becomes 


f (x) = aF: (x) F3 (a) F3 (a)... Fs (x) 
and the factorization (12) for d, (x) = (f (x), F (x)) will be rewrit- 


ten once at a 
time, as before. If under these conditions, for some j in f (x), there 
are no j-fold factors, set F; (x) = 1. Then f (z) will be divisible by 
the kth degree of the polynomial F, (£), k = 1, 2, ..., s, and 
the factorization (11) becomes 


f (x) = aF, (x) F3 (x) F3 (x)... Fs (2) 
and the factorization (12) for d, (x) = (f (x), f (x)) will be rewrit- 


ten as 
dy (x) = Fy (x) Fz (2)... F> (2) 


Denoting by d; (x) the greatest common divisor of the polynomial 
d, (x) and of its derivative, and generally by dą (z) the greatest com- 
mon divisor of the polynomials d,., (x) and dp-, (x), we obtain 
in the same fashion 
dy (a) = Fs (2) F$ (2)... FE* (a), 
d3 (x) = F, (x) F: s(t)... F> a), 


= >o © a a ò ò ù ç o ò ò u 4 @ 


ds- (x) = F; (2), 


d; (x) = 1 
Whence 
V4 a= = F; (x) Fo (x) Fa (x)... Fs (zx), 
v, (x) = we = F, (2) Fa (x)... Fa (2), 
= 


V3 (x) = AE = F; (x) ... Fa (£) 


eo @ ù ù = © è 2» ọọ ù © o © © © o % 


and, therefore, finally, 


Vi (T Vo (T 
F, (2) = AL , F, (2) = am „a, Fs (2) =v, (2) 

Thus, using only procedures that do not require a knowledge 
of the irreducible factors of the polynomial f (z), namely, taking 
the derivative, using the Euclidean algorithm and the division 
algorithm, we can find the polynomials F, (x), F, (2), F, (zx) 
without multiple factors; every irreducible factor of the polynomial 
Fp (£), k = 4, 2, ..., s, will be k-fold for f (z). 


19—5760 
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This method cannot, of course, be regarded as a procedure for 
factoring a polynomial into irreducible factors, since for the case 
of s = 1 (that is, for a polynomial without multiple factors) we 


only get f (x) = F; (z). 
49. Theorem on the Existence of a Root 


Quite naturally, the fundamental theorem (proved in Sec. 23) 
on the existence, for every numerical polynomial, of a root in the 
field of complex numbers cannot be extended to the case of an arbi- 
trary field. In this section we will prove a theorem which in the 
general theory of fields replaces to some extent the afore-mentioned 
fundamental theorem of the algebra of complex numbers. 

Let there be given a polynomial f (x) over a field P. A natural 
question arises: if the polynomial f (x) does not have any roots at all 
in the field P, then does there exist an extension P of P in which there 
will be at least one root of f (x)? We can assume that the degree of 
the polynomial f (x) is greater than unity: the question is meaning- 
less for a zero-degree polynomial, and every polynomial of degree 


one, ax + b, has the root — b in the field P itself. On the other 


hand, we can evidently confine ourselves to the case of f (x) being 
irreducible: if it is reducible over P, then the root of any one of its 
irreducible factors will be a root of f (x) itself. 

The answer to the question that interests us is given by the 
following theorem on the existence of a root. 

For every polynomial f (x) that is irreducible over the field P there 
is an extension of the field such that contains a root of f (x). All mini- 
mal fields containing the field P and a root of this polynomial are 
isomorphic among themselves. 

Let us first prove the second part of the theorem. 

Suppose we have a polynomial irreducible over P: 

J (x) = apr” + az" t+... + any tay, (1) 
and n > 2, that is, f (x) has no roots in the field P itself. Suppose 
that there is an extension P of P which contains a root a of f (z). 
Let us prove the following lemma which will be needed later on but 
which is of interest in itself. 

If a root a, in P, of a polynomial f (x) which is irreducible over 
P serves also as a root of some polynomial g (x) in the ring P |x] then 
f (x) will be a divisor of g (zx). 

Indeed, the polynomials f (x) and g (x) over the field P have 
a common divisor z — a and so are not relatively prime. The pro- 
perty of polynomials not to be relatively prime does not, however, 
depend on the choice of the field. It is therefore possible to pass 
to the field P and apply Property (c) of Sec. 48. 
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Now let us find the minimal subfield P (a) of P which contains 
the field P and the element a. It definitely includes all elements of 


the form 
B = bo + bia + ba? +... + Opa? (2) 


where bo, bis, bs, . - -> On- are elements of P. No element of P can 
have two distinct notations of the form (2); if it is also true that 


B = co + eye + cea? +... foie 
and for at least one Xk, Ch =Æ bp, then @ will be a root of the polynomial 
g (x) = (bo — co) + (b1 — c4) 2 + (ba — Cy) 2? 
+... + (bni — Cpe) 2 
which runs counter to the lemma proved above since the degree 


of g (x) is lower than the degree of f (x). 


The elements of the field P having the form (2) include all the 
elements of the field P (for b; = b, =... = b- = 0), and also 
the element a itself (for b, =1, bọ = b, = = b,- = 0). 
We now prove that elements of the form (2) constitute the entire sought- 
for subfield P (a). Indeed, if we are given elements B [with notation 


(2)] and 
y = Co + cya + cea? +... beni 
then, on the basis of the properties of operations in the field P, 
i a a AE + ca) a? 
+. . KN (On— Æ Cnt) ant 

That ; is. to say, the sum and difference of any two elements of the 
type (2) are again elements of that type. 

- If we multiply 6 and y, we get an expression containing a” and 
other higher powers of a. However, it follows from (1) and the equa- 
lity f (~) = 0 that a” and therefore a”+!, a+? and so on can be 


expressed in terms of lower powers of the element a. The simplest 
way of finding an expression for fy is this: let 


p (x) = bo + Oye +... + Op ye™}, 
p (2) = co + cix +... F Cng! 


whence ọ (a) = f, p (a) = y. Multiply the polynomials g (x) and 
p (z) and divide the product by f(z). This yields 


B e@v@=f@q@tre | (3) 
where 
r (z) = do + Qe +... + dn- 

Taking the values of both sides of (3) for x = @ we find that 

: p (x) (a) = f (@) q (@) +r (a) 
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That is to say, by f (a2) = 0, 
| By = do + ha +... + da 


Thus, the product of two elements of the type (2) will again be an 


element of this type. 
Finally, we will show that if element f is of the type (2), B 2 0, 


then the element B-t} existing in the field P can also be written as 
(2). To do this, take the polynomial 


p (x) = bo + biz +... + Bye 


in the ring P [z]. Since the degree of ọ (zx) is lower than the degree 
of f (x), and the polynomial f (x) is irreducible over P, it follows that 
ọ (z) and f (x) are relatively prime and therefore, by Secs. 24 and 
47, there exist in the ring P [z] polynomials u (z) and v (xz) such 


that 
p (z) u (z) + f (z) v (z) = 1 
We can assume here that the degree of u (z) is less than n: 
u (£) = so + yet... F Sa”! 
Whence, by f (a) = 0, it follows that 
p (a) u (a) = 4 

and therefore, by the equality ọ (&) = B, we have 

p2 = u (a) = so + sa +... + sya"? 


Thus, the collection of elements of the field P having the form 
(2) constitutes a subfield of P, which is the desired field P (a). Fur- 
thermore, since we saw that in seeking the sum and product of the 
elements 6 and y of the type (2) we need only know the coefficients 
of the expressions of these elements in terms of powers of a, we can 
assert the truth of the following result. If besides P there is another 
extension P’ of the field P, which also contains a root a’ of the 
polynomial f (x), and if P (a’) is a minimal subfield of the field P’ 
containing P and a’ then the fields P (a) and P (a’) are isomorphic. 
To obtain the isomorphic correspondence between them, it is neces- 
sary to associate with the element B of type (2) in P (a) an element 


B’ = bo + bia’ -+ b,a"? + cee + bp-i’ ™1 


in P (a’) having the same coefficients. This completes the proof 
of the second part of the theorem. 

Let us now prove the basic first part of this theorem. The fore- 
going will help to point the way. We have a polynomial f (x) of de- 
gree n >> 2 that is irreducible over the field P and it is required to 
construct an extension of P containing a root of f (x). To do this, 
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let us take the entire polynomial ring P [z] and partition it into 
disjoint classes, combining in one class the polynomials which yield 
the same remainders upon division by the given polynomial f (z). 
In other words, the polynomials (x) and p (z) belong to the same 
class if their difference is exactly divisible by f (z). 

We agree to denote the resulting classes by the letters A, B, C 
and so on and to define the sum and product of classes in the following 
natural manner. Take any two classes A and B; choose in A a poly- 
nomial q; (x), in B a polynomial yp, (x) and denote by x, (x) the sum 
of these polynomials: 


Xa (z) = qı (z) + (2) 

and by Q, (x) their product: 
O, (z) = qı (x); (z) 

Now choose any other polynomial ọ, (x) in A and any polynomial 
þa (z) in B and denote by X% (x) and ©, (x) their sum and product, 
respectively: 

Xe (1) = Pe (2) + Yp (2), 

O, (2) = ge (z) -pz (2) 
By hypothesis, the polynomials ọ; (z) and @, (x) are in the same 
class A and therefore their difference q,(z) — @_ (x) is exactly 


divisible by f(z); the difference p, (z) — wp, (x) has the same pro- 
perty. From this it follows that the difference 


Xa (2) — Xo (x) = [qs (z) + ps (2)) — Ip (2) + Y (2)] 
= [g (2) — pe (3) + th (z) — p (a) (4 


is also exactly divisible by the polynomial f (z). This is also true 
of the difference ©, (x) — 9, (x) since 


Oy (z) — Oa (£) = qı (z) pı (z) — Pa (z) p (z) 
Pa (2) Pa (2) — pa (2) pa (1) + P1 (2) Ya (1) — P2 (2) pa (2) 
= gı (2) [py (z) — pa (z)] + [pi (z) — pz (2)l pa (z) (5) 
Equation (4) shows that the polynomials x, (z) and y, (z) lie 
in the same class. In other words, the sum of any polynomial from 
class A and any polynomial from class B belongs to a very definite 
class C, which does not depend on what polynomials are chosen as 


“representatives” in classes A and B. We call this class C the sum 
of the classes A and B: 


l 


C=A+B8B 


`- Similarly, because of (5), there is a class D which is independent 
of the choice of representatives in classes A and B and in which lies 
the product of any polynomial of A by any polynomial of B. We 


294 CH. 10. FIELDS AND POLYNOMIALS 


call this class the product of the classes A and B: 
D = AB 


We shall show that the collection of classes into which we have 
partitioned the ring P [z] of polynomials is converted into a field 
after the indicated introduction of the operations of addition and 
multiplication. Indeed, the validity of the associative and commu- 
tative laws for both operations and of the distributive law follows 
from the validity of these laws in the ring P [z], since operations 
on classes reduce to operations on the polynomials lying in these 
classes. The role of zero is evidently played by the class composed of 
polynomials divisible exactly by the polynomial f(x). We call 
this the zero class and denote it by the symbol 0. The opposite of 
class A, which is made up of polynomials that yield the remainder 
ọ (x) upon division by f (x), is the class made up of polynomials 
which yield the remainder —ọ (x) upon division by f (xz), whence it 
follows that subtraction is unique on the set of classes. 

To prove that division is possible on the set of classes, we have 
to show that there exists a class playing the role of unity and that 
for any class different from zero there is an inverse class. The class 
of polynomials which upon division by f (xz) yields a remainder í 
will obviously be unity. We call this the unit class and denote it 
by the symbol E. 

Now suppose we have a class A different from zero. A polynomial 
ọ (x) chosen in A as a representative will thus not be exactly divi- 
sible by f (x) and therefore, because of the irreducibility of f (z), 
these two polynomials are relatively prime. Thus, in the ring P [z] 
there exist polynomials u (x) and v (x) that satisfy the equation 


@ (z) u (z) + f (z) v (z) = 1 


whence 
@ (z) u (z) = 1 — f (z) v (z) (6) 


Upon division by f (x), the right member of (6) yields a remain- 
der 1, which means it belongs to the unit class Æ. If the class to 
which the polynomial u (x) belongs is denoted by B, then (6) shows 
that 

AB=E 


whence B = A~}. This is proof of the existence of an inverse class 
for every nonzero class; in other words, this completes the proof 
that classes form a field. 

We will denote this field by P and will show that it is an extension 
of the field P. With every element a of the field P is associated a class 
composed of polynomials which upon division by f (x) yield a remain- 
der a; the element a itself, regarded as a zero-degree polynomial, 
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belongs to this class. All classes of this special type constitute, in 
the field P, a subfield that is isomorphic to the field P. Indeed, the 
one-to-one nature of the correspondence is obvious; on the other 
hand, for representatives in these classes we can choose elements 
of the field P and therefore with the sum (product) of elements of 
P is associated a sum (product) of corresponding classes. Consequen- 
tly, in the future we will not need to distinguish between the ele- 
ments of a field P and the classes corresponding to them. 

Finally, use X to denote the class made up of polynomials 
which upon division by f (x) yield the remainder z. This class is 
a definite element of the field P, and we wish to demonstrate that 
it is a root of the polynomial f (x). Let 


f (x) TF agr” + ayn? + eee + An- + an 


We denote by A; the class corresponding, in the foregoing sense, to 
the element a; of the field P, i = 0, 1, ..., n, and will find out 
what the element 


AX” + AyX™ +. o.o + Ans X + An (7) 


of the field P is equal to. Assuming elements a;,i= 0,1, ..., N, 
to be representatives of the classes A; and the polynomial zx to be 
a representative of the class X, and using the definition of addition 
and multiplication of classes, we find that the polynomial f (z) 
is itself contained in class (7). However, f (z) is exactly divisible 
by itself and therefore class (7) turns out to be the zero class. Thus, 
by replacing in (7) the classes A; by the elements a; of P correspon- 
ding to them, we find that the following equality holds in the field P: 


aX” +- a XP I+ LL tay X +a, = 0 


That is to say, the class X is indeed a root of the polynomial f (z). 

This completes the proof of the theorem on the existence of a 
root. Note that by taking the field of real numbers for P and setting 
f (xz) = z? + 1, we obtain yet another method for constructing the 
field of complex numbers. 

Certain corollaries can be derived from the theorem on the exi- 
stence of a root similar to those derived in Sec. 24 from the funda- 
mental theorem of the algebra of complex numbers. One remark is 
in order first, however. Since any linear factor x — c of a polyno- 
mial f (x) is irreducible, it must appear in the unique factorization 
of f (x) into irreducible factors. 

However, the number of linear factors in the factorization of 
f (x) into irreducible factors cannot exceed the degree of the poly- 
nomial. We get the following result. 

A polynomial f (x) of degree n cannot have more than n roots in 
the field P, even if each of the roots is counted with its multiplicity. 
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We use the term splitting field of a polynomial f (x) of degree 
n over the field P for an extension Q of P such that contains n roots 
of f (x) (counting multiplicity in the case of multiple roots). Con- 
sequently, over the field Q the polynomial f (x) will decompose into 
linear factors, and no further extension of the field Q can make new 
roots appear for f (x). 

For every polynomial f (x) in the ring P [x] there is a splitting 
field over the field P. 

Indeed, if a polynomial f (x) of degree n, n > 1, has n roots in 
the field P itself, then P will be the desired splitting field. But if 
f (x) does not decompose into linear factors over P, then we take 
one of its nonlinear irreducible factors ọ (x) and, on the basis of the 
theorem of the existence of a root, we extend P to the field P’, which 
contains a root of ọ (x). If the polynomial f (x) still does not break 
up into linear factors over P’, we again extend the field, thus crea- 
ting a root for one more of the remaining nonlinear irreducible 
factors. In a finite number of steps we will obviously arrive at the 
splitting field for f (z). 

Quite naturally, f (z) can have many-different splitting fields. 
One can prove that all the minimal fields containing the field P 
and n roots of the polynomial f (x) (where n is the degree of the 
polynomial) are isomorphic. However, we will not make use of this 
assertion and will therefore not give the proof. 

Multiple roots. In the previous section we proved that a polyno- 
mial f (x) over a field P of characteristic 0 does not have multiple 
factors if and only if it is relatively prime to its derivative; it was also 
noted that the absence, in f (x), of multiple factors over P implies 


the absence of such factors over any extension P of the field P. 


Let us apply this to the case when P is a splitting field for f (zx); 
recalling the definition of a miltiple root, we arrive at the following 
result. 

If a polynomial f (x) over a field P of characteristic 0 does not have 
multiple roots in the given splitting field, then it is relatively prime 
to its derivative f' (x). Conversely, if f (x) is relatively prime to its deri- 
vative, then it does not have multiple roots in any one of its splitting 
fields. 

Whence, in particular, it follows that a polynomial f (x) which 
is irreducible over a field P of characteristic 0, cannot have multiple 
roots in any extension of the field. This assertion does not hold in 
fields of a finite characteristic. This circumstance plays a perceptible 
role in the general theory of fields. 

Note in conclusion that for an arbitrary field, the Vieta formulas 
hold too (see Sec. 24); here, the roots of the polynomial are taken 
in some splitting field of this polynomial. 
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50. The Field of Rational Fractions 


The theory of rational fractions described in Sec. 25 holds in fuil 
for the case of an arbitrary base field as well. However, when passing 
from the field of real numbers to an arbitrary field P, the view taken 
Í Le z 


of the expression =z; as a function of the variable z must be rejected, 


for, as we ET if. is not applicable to polynomials. Our job here 
is to figure out the meaning of these expressions for the case when 
the coefficients belong to an arbitrary field P. More precisely, we 
want to construct a field containing the polynomial ring P [z] 
and in such a way that the operations of addition and multipli- 
cation defined in the new field coincide, as applied to polynomials, 
with the operations in the ring P [z]; in short, the ring P [x] must 
be a subring of this new field. On the other hand, any element of 
the new field must be representable (in the sense of division as defined 
in this field) in the form of a quotient of two polynomials. As will 
now be shown, such a field can be constructed for any P. We denote 
it by P (x) (the unknown is in the parentheses) and call it the field 
of rational fractions over the field P. 

First assume that the ring P [x] is already a subring of some field 
Q. If f(x) and g (z) are arbitrary polynomials from P [z], and 
g (x) 0, then there is, in the field Q, a uniquely defined element 
equal to the quotient obtained by the division of f (x) by g (x). Deno- 


ting this element by Le) , as is the usual way in the case of a field, 


we can write the following equation on the basis of the definition 
of a quotient: 


f(a) = (a) 13 (1) 


where the product is to be understood in the sense of multiplica- 
tion in the field Q. It may happen that some quotients a and 
g (2) 


p(z are one and the same element of Q. The condition for this is 


the ordinary condition of equality of fractions: 


eae pepe vel pear 


Indeed, if 3 =a, then, by (1), 


f (z) =g (2) a, ọ (2) =Y@)a 


whence 
f (x£) Ņ (2) = g (z) } (1) a = g (x) el 


Conversely, if f (x) p (xz) = g (x) ọ (z) = u (x) in the sense of multi- 
plication in the ring P [z], then, passing to the field Q, we obtain 
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the equalities 
f(z) o ul)  &) 
g(z) ge (z)P(z)P (2) 


Furthermore, it is easy to see that the sum and product of any ele- 
ments of Q, which are quotients of polynomials in P [z], can again 
be represented in the form of such quotients, and the ordinary rules 
of addition and multiplication of fractions hold true: 


HORR (2) f(@)Wl2)+6 (2) ple) (2) 
§(z) © p (2) & (2) p (2) : 
f(z) (z) _ f(z)-@ (2) (3) 


Indeed, multiplying both sides of these equations by the pro- 
duct g (x) (x) and applying (1), we get equalities which hold true 
in the ring P [z]. The validity of (2) and (3) now follows from the 
fact that, thanks to the absence of zero divisors in the field Q, both 
sides of each of the resulting equalities may be reduced by a nonzero 
element g (x) (x) without spoiling the equalities. 

These preliminary remarks suggest the path we should take in 
constructing the field P (x). Suppose we have an arbitrary field P 
and over it a polynomial ring P [zx]. With every ordered pair of po- 
lynomials f (x), g (x), where g(z) £0, we associate the symbol 
a , called a rational fraction with numerator f (x) and denominator 
g (xz). We stress the fact that this is only a symbol corresponding 
to the given pair of polynomials, since, generally speaking, divi- 
sion of polynomials in the ring P [z] itself is impossible, and so far 
the ring P [z] is not contained in any field. Even if g (z) is a divisor 
of f (x), the new symbol La) should for the time being be distingui- 


shed from the polynomial obtained as the quotient in the division 
of f (x) by g (z). 
We now call the rational fractions T(z) and g (2) equal, 
(2) _ p (2) G p 

f(x 

g(s) (2) (4) 
if in the ring P [z}] we have the equality f (x) p (x) = g (z) g (2). 
It is obvious that any fraction is equal to itself and that if one frac- 
tion is equal to another, then the second one is equal to the first. 
Let us prove the transitive property of this concept of equality. We 
are given equalities (4) and 


p (z) _ u (z) (5) 


From the equalities 


rora =e@)e), yve = pa) u (a) 
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equivalent to them in the ring P [z] it follows that 
f (x) v (x) ẹ (x) = g (2) @ (z) v (z) = g (x) u (2) p @) 
and therefore, after cancelling out the nonzero (as the denominator 
of one of the fractions) polynomial w (x), we get 
| f (2) v (z) = g (2) u (2) 
whence, by the definition of the equality of fractions, 
(z) ulz) * 


g (z)  v(z) 


This completes the proof. 

Now let us combine into one class all fractions equal to some 
one given fraction, and therefore (by virtue of the transitivity of the 
equality) equal among themselves. If one class has even a single 
fraction not contained in another class, then, as follows from the 
transitivity of the equality, these two classes do not have a single 
element in common. 

Thus, the collection of all rational fractions written by means 
of polynomials from the ring P [z] breaks up into disjoint classes of 
fractions equal among themselves. We would now like to define 
algebraic operations in this set of classes of equal fractions so that 
it becomes a field. To do this, we will define operations on rational 
fractions and will each time verify that the replacement of summands 
(or factors) by fractions equal to them replaces the sum (or product) 
also by an equal fraction. This will enable us to speak of the sum 
and product of classes of equal fractions. 

First, let us make the following remark which will be used re- 
peatedly in what follows. A rational fraction becomes an equal frac- 
tion if its numerator and denominator are multiplied by one and the 
same nonzero polynomial, or reduced by any common factor. Indeed, 


H(z) _ (2) h(a) 
g (x) g (x) h (x) 


since in the ring P [z] 
f (2) lg (z) h (x)] = g (x) If (x) h (x) 

We define the addition of rational fractions by formula (2), 
since from g(x) Æ 0 and y (z) 0 it follows that g (x) p (x) = 0, 
the right member of this formula is indeed a rational fraction. 
Furthermore, if it is given that 

f(z) _ fol) =P) _ Pol) 
g (z) go(z)? plz) po(2) 


that is, 
f (x) go (£) = g (x) fo (£), Ẹ (z) Wo (x) = P (x) Po (2) (6) 
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then, by multiplying both members of the first of the equalities (6) 
by w (z) Wo (z), both members of the second equality by g (x) go (z) 
and then adding these equalities termwise, we obtain 


If (x) p (z) + g (2) p (2)! go (2) Po (2) 
= [fo (£) Po (x) + Bo (2) Go (z)] g (z) Y (x) 
which is equivalent to the equation 
f (x) Yp (2) +8 (2) p (z) _ fo (£) Yo (2) +80 (2) po (2) 
g (2) p (2) go (£) Wo (2) 

Thus, if we have two classes of equal fractions, the sum of any 
fraction of one class and any fraction of the other class is equal to 
any other such sum, that is to say, such sums lie in some definite 
third elass. This class is called the sum of the two given classes. 


The commutativity of this addition follows directly from (2); 
the associativity is proved as follows: 


f (2) + £0] 4 20) u (z) = Lev Gte@ee) za 


ga) * palo ov@ ~ @@ ve) 
__ fF (2) p(w) v (a) Lg (2) g (2) v (2) +8 (2) (2) u (2) 
Roa te RENEE Se 
— fe) , e@)v@)+oz)u@ _ fz) [ol , vu 
gy TOO -i+ tre. 


From the definition of equality of fractions it is easy to derive 
that all fractions of the form Te , i.e., fractions with zero numerator, 


are equal and that they form a complete class of equal fractions. 
We call this class the zero class and we will prove that in our addition 


it plays the part of zero. Indeed, if we have an arbitrary fraction @ (2) 


p (z)’ 
then 
cee a ae p(z)+8 (z) p(z) _ 8) p (2) _ Pl) 
g (2) ' p (2) g (z) tp (2) g (z) tp (2) > (z) 
From the equation 
fi) , -f@__9 
6@) + em ~F@ 
the right side of which belongs to the zero class, it now follows that 


—f (2) 
g (2) 


class of fractions equal to the fraction Lia) a, From this, as we know, 


will be opposite to the 


the class of fractions equal to the fraction 


follows the validity of unique subtraction. 
We define multiplication of rational fractions by formula (3); 
since g (x) p (z) = 0, the right member of this formula will indeed 
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be a rational fraction. Furthermore, if 


f(z) _ folz) = &) __ Po (2) 
g(z) gE = pz) Po (2) 


that is, 
Í (x) go (z) = g (x) fo (z),  @ (2) Po (2) = Y (z) po (2) 
then, by multiplying out these latter equations termwise, we get 
Í (x) go (x) p (z) Wo (z) = g (2) fo (x) Yp (z) Go (2) 
which is equivalent to the equation _ 
f (x) p (2) _ fo (x) Po (2) 
g (z) p(z) Bq (Z) po (2) 


Thus, by analogy with the above-defined sum of classes, we can speak 
of a product of classes of equal fractions. 

The commutativity and associativity of this multiplication follow 
immediately from (3) and the validity of the distributive law is 
proved as follows: 


(£2 oe — f(z) Y (z)+¢8 (2) 9 (2) u ) 
g (2) (x) T g (x) } (2) v (2) 
_ (f(z) p(z)+e (2) g (2) u (z) _ f (2) Y (2) u (2) +g (2) ọ (2) u (2) 
g (2) Y (x) v (z) g (x) tp (z) v (2) 
_ F(z) p (x) u (a) v (z) +g (z) ọ (2) u (z) v (2) E A LLE LRA 
g (x) p (z) v? (z) g (z) v (z) © p (z\v (2) 
__ fle) ul) oa) ule) 
g (2) v (z) p(z) v (z) 


It is easy to see that fractions of the type a rey , i.e., fractions whose 


numerators are equal to the denominators, are equal and constitute 
a separate class. This class is termed the unit class and in our multi- 
‘plication plays the role of unity: 


f(z) g @) _ fez) _ g (2) 


F(z) pe) FRE)” pe) 
Finally, if the fraction + ae z does not belong to the zero class, 


i.e., f (x) 0, then there is a fraction Hn . Since 


f(z) g) fg) 

g(t) fe) e@ie@ . 
and the right member of this equality belongs to the unit class, the 
class of fractions equal to the fraction 23%” Œ) will be inverse to the class 


HO 
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f(z 


of fractions equal to ~ . Whence follows the validity of unique divi- 


(2) 
HOR 
sion. 

Thus, the classes of equal rational fractions with coefficients from 
the field P constitute, in our definition of operations, a commutative 
field. This is the desired field P (x). Incidentally, we still have to 
prove that this field which we have constructed contains a subring 
isomorphic to the ring P [zx] and that every element of the field can 
be represented as a quotient of two elements of this subring. 

If we associate with an arbitrary polynomial f (z) ae the ring 


P [z] a class of rational fractions equal to the fraction H2 ie ) (among all 


these fractions there are of course fractions whose domo 
are equal to unity), we obtain a one-to-one mapping of the ring P [z] 
into the feld we have constructed. Indeed, from the equality 
f(z) _ g (2) 
St eh 

it would follow that f (x)-1 = 4-@ (z), that is to say, f (z) = @ (z). 
This mapping will even be isomorphic, as the following equations 
show: 


A ape ee oa ge oe 


f (2) g (z) _ f(a)-1+¢ (z)-4 _ t (z)+8 (2) 
4 ’ 


F(z) a Ha: g (z) 
Tai 
f (2) 


Thus, the classes of fractions equal to ee of form ~ consti- 


tute, in our field, a subring that is isomorphic to the ring p [z]. The 
fraction Ete) can therefore be denoted simply as f (x). And finally, 


since for g (z)=40, the class of fractions equal to the fraction —~ 


is the inverse of the class of fractions equal to the fraction a 
it follows from the equality 


A R A 
41 g(a) g(2) 
that all elements of our field may be considered (in the sense of 
operations defined in this field) to be quotients of: polynomials of the 
ring P Íz]. 

Over an arbitrary field P we constructed the field of rational 
fractions P (x). Using this same method, we can construct the field 
of rational numbers by taking the ring of integers in place of the 
ring of polynomials. Combining these two cases and using the same 
kind of method, we could prove a theorem asserting that, generally, 
any commutative ring without divisors of zero is a subring of some 
field. 


CHAPTER 11 


POLYNOMIALS 
IN SEVERAL UNKNOWNS 


54. The Ring of Polynomials in Several Unknowns 


One often has to consider polynomials that depend on two, three, 
and, generally, several unknowns. In the first chapters of this book 
we studied linear and quadratic forms, which are examples of such 
polynomials. Generally speaking, a polynomial f (zi, x2, «++, Xn) 
in n unknowns x1, Z, . . .) Zp over some field P is the sum of a finite 
number of terms of the form zh, zt, ..., xn, where all k; > 0, 
with coefficients from the field P. It is assumed, quite naturally, 
that the polynomial f (z4, x, ...-, £n) does not contain like terms 
and that only terms with nonzero coefficients are considered. Two 
polynomials in n unknowns, f (£i, 2, ..., Zn) and g(x, Za ... 
. ++, Zp) are called equal (or identically equal) if the coefficients of 
like terms are equal. 

If a polynomial f (zi, x2, . . . Zn) is given over a field P, then 
its degree with respect to the unknown zi, i= 1, 2, ..., n, is the 
highest exponent with which z; appears in the terms of the polyno- 
mial. By chance, the power may be 0, which means that although f 


is considered a polynomial in n unknowns x, ro, ..., Lip ©- +> Zn- 
the unknown z; does not actually appear in the notation. 
On the other hand, if we call the number ky + ka + ... +h 


(that is, the sum of the exponents of the unknowns) the degree of the 
term . 


hy hk k 
LILI aL. Ln? 


then the degree of the polynomial f (£1, 22, . : ., £n) (that is, the degree 
of the unknowns taken together) is the highest degree of its terms. 
In particular, as in the case of one unknown, only nonzero elements 
from the field P will be polynomials of degree zero. On the other 
hand, as in the case of polynomials in one unknown, zero will be the 
only polynomial in n unknowns whose degree is not defined. Of cour- 
se, a polynomial can in the general case contain several highest- 
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degree terms and therefore one cannot speak of the highest-degree 
term of a polynomial. 

The operations of addition and multiplication are defined as 
follows for polynomials in n unknowns over a field P. The sum of the 
polynomials f (2, 22, ..., Zn) and g (zi, 2, ...; Zn) is a polyno- 
mial whose coefficients are obtained by adding the corresponding 
coefficients of the polynomials f and g; if some term occurs in only 
one of the polynomials f, g, then its coefficient in the other polyno- 
mial is naturally taken to be zero. The product of two “monomials” 
is defined by the equation 


artir}? ... oh patel? ... alt — (ab) giithyhete | gřntin 
after which the product of the polynomials f (£4, £a, ..-., £n) and 
g (Zi, Xa, ..., Zn) is defined as the result of a termwise multiplica- 


tion and subsequent collecting of like terms. 

Given this definition of operations, the collection of polynomials 
in n unknowns over the field P becomes a commutative ring, which does 
not contain divisors of zero. Indeed, form = 1 our definitions coincide 
with those which were given in Sec. 20 for the case of polynomials 
in one unknown. Let it already be proved that the polynomials in 
n — 1 unknowns 2, Za, ..., Zn -i with coefficients from the field 
P constitute a ring without divisors of zero. Any polynomial in n 
unknowns 24, Za, . . ., Zn-1s In may be uniquely represented as a po- 
lynomial in the unknown z, with coefficients which are polynomials 
iN 2, Zos .. +, Zn-1; conversely, any polynomial in z, with coeffi- 
cients from the ring of polynomials in 2, x, ..., 2-1 over the 
field P may of course be regarded as a polynomial over this same 
field P with respect to the entire collection of unknowns z4, £a, .. - 
. ++) n-i Zn. It may readily be verified that the one-to-one corre- 
spondence we have obtained between the polynomials in n unknowns 
and the polynomials in one unknown over the ring of polynomials 
in n— 1 unknowns is isomorphic with respect to the operations of addi- 
tion and multiplication. The assertion being proved follows now from 
the fact that polynomials in one unknown over the ring of polyno- 
mials in n— 1 unknowns themselves constitute a ring, and, as a ring 
of polynomials in one unknown over the ring without zero divisors, 
it does not itself contain any divisors of zero (see Sec. 47). 

Consequently, we have proved the existence of a ring of polynomials 
in n unknowns over the field P. This ring is denoted by the symbol 
Playas: s.. Gye 

The following considerations permit regarding the ring of poly- 
nomials in n unknowns from a somewhat different angle. Let a field 
P be contained in some commutative ring L as a subring. In L take 
n elements a1, Q, ..., @ and find the minimal subring L’ of the 
ring L which contains these elements and also the entire field P, 
that is, the subring obtained by adjoining the elements a4, Qo, ...; On 
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to the field P. The subring L’ consists of all elements of the ring L 
which are expressed in terms of the elements a,, a, . . .,.@, and the 
elements of the field P by means of addition, subtraction and mul- 
tiplication. It is easy to see that what we have are precisely those 
elements of the ring L which may be written (with the aid of the ope- 
rations occurring in L) in the form of polynomials in a, Œs, ..., Gp 
with coefficients from P; these elements, being elements of the ring 
L, will add and multiply precisely in accord with the rules of addi- 
tion and multiplication of polynomials in n unknowns. 

Of course, speaking generally, a given element f of the subring L’ 
will possess many different notations in the form of a polynomial 
iN G1, Qo, - . +, Œn with coefficients from the field P. If for any f in 
L’ such a notation is unique, i.e., if the different polynomials in 
Qis Oo, . ++, Om are distinct elements of the ring L’ (and, hence, of 
the ring L), then the system of elements a,, Œo, ..., Gp is called 
algebraically independent over the field P, otherwise it is algebraically 
dependent.* From this we can draw the following conclusion. 

If the field P is a subring of a commutative ring L and if the sys- 
tem of elements 04, Qe, .. +, On Of L is algebraically independent over 
P, then the subring L’ of the ring L generated by adjoining to P the 
elements G4, Oy, . . +, On is isomorphic to the polynomial ring P |x, 
Das is aah ; 

Of the other properties of the ring P [z,, £a, ..., 2] of polyno- 
mials in n unknowns we indicate the following: this ring may be 
included in the field P (a, £3, ...; Zn) of rational fractions in n 
unknowns over the field P. Every element of this field can be written 


as L, where f and g are polynomials of the ring P [z1, £a, ..., 2p]; 


then L = if and only if fp = gg. Addition and multiplication 


of these rational fractions is performed by the rules which, as indi- 
cated in Sec. 45, hold true for quotients in any field. The existence 
proof of the field P (z1, £3, . - ., zn) is carried out just as it was in 
Sec. 50 for the case n = 1 

We can construct a theory of divisibility for polynomials in se- 
veral unknowns that generalizes the theory of divisibility for polyno- 
mials in one unknown, which we studied in Chapters 5 and 10. 
However, since we do not intend to go into a detailed study of the 
ring of polynomials in several unknowns, we will confine ourselves 
to the problem of factoring a polynomial into irreducible factors. 

First let us introduce the following concept: if all terms of a po- 
lynomial f (zi, £a, ..., zn) have one and the same degree s, then 


* The appropriate concepts for the case of n == 1 were introduced in 
Sec. 47: there, an element a, algebraically independent over the field P in the 
sense of the foregoing definition, was called transcendental over P, otherwise 
it was algebraic over P 


20—5760 
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it is called a homogeneous polynomial or, briefly, a form of degree s; 
we are acquainted with linear and quadratic forms, and we could 
consider cubic forms, all terms of which are of degree 3 in the unknowns. 
taken together, etc. Any polynomial in m unknowns can be uniquely 
represented as a sum of several forms in these unknowns, the latter 
having various degrees. To obtain the desired representation, all 
we need to do is combine all terms of the same degree. For example, a 
polynomial of degree four f (21, £a, £3) = 32423 — Taxa} + x. — 5ayx—g43+ 
+ a{— 2x, — 6+ z} is the sum of the quartic form zt — 7272z?, the 
cubic form 32,73 — 52,x,23 + 2%, the linear form z, — 2z, and the 
constant term (a form of degree:zero) —6. 

Let us now prove the following theorem. 

The degree of a product of two nonzero polynomials in n unknowns 
is equal to the sum of the degrees of the polynomials. 

First suppose that we have the forms 9 (2, xq, ..., £n) of degree 
sand p (zi, Za . . . Xn) of degree t. The product of any term of the 
form ọ by any term of the form » will obviously have the degree 
s + t, and so the product gy will be a form of degree s + ¢t, since 
collecting like terms cannot make all the coefficients of this product 
vanish due to the absence of divisors of zero in the ring P [z,, 2, ... 


acne Dal 
If we are now given arbitrary polynomials f (£4, £s, . . ., Zn) and 
g (£i, Zo, - - +, Zn) of degrees s and ż, respectively, then, by represen- 


ting each of them as a sum of forms of different degrees, we get 


f (ay Tas >. vy In) = Ọ (£i, Zan ony BM) Hees, 
g (Zi, Zos -e.s In) = Y (Zi, Zo o- In) H... 


where ọ and p are, respectively, forms of degrees s and ż, and the 
dots stand for sums of forms of lower degrees. Then 


fs = ppt... 


By what has been proved, the form gp is of degree s + t, and since 
all terms replaced by dots are of lower degree, the degree of the pro- 
duct fg will be equal to s + t. The theorem is proved. 

The polynomial ọ is called the divisor of the polynomial f, and f 
is the dividend which is divided by q, if in the ring P [x,, Zo, ..., znl 
there is a polynomial such that f = gy. It is easy to see that the 
divisibility properties I-IX (Sec. 21) are preserved in this general 
case as well. A polynomial f of degree k, k > 1 is called reducible 
over a field P if it can be decomposed into a product of polynomials 


from the ring P lzi, 2x2, ...-, £n} whose degrees are less than k. 
Otherwise it is an irreducible polynomial. 
Any polynomial in the ring P |x, x2, . . ., £n] having a nonzero 


degree can be decomposed into a product of irreducible factors. This 
decomposition (factorization) is unique to within factors of degree zero. 
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This theorem generalizes the corresponding results of Sec. 48 
which refer to polynomials in one unknown. The first assertion is 
proved by repeating exactly the reasoning of Sec. 48. The proof of 
the second assertion is much more difficult. Before attempting it, 
we note that from the second assertion of this theorem there follows 
a corollary: if the product of two polynomials f and g from the ring 
P [x4, 2g, . - . Lp] is divisible by an irreducible polynomial p, then at 
least one of these polynomials is divisible by p. This is so, for other- 
wise we would have, for the product fg, two decompositions into 
irreducible factors, one of which contains p and the other does not. 

Suppose the theorem has been proved for polynomials in n unk- 
nowns and we wish to prove it for a polynomial in n + 1 unknowns 
XL, Ly, Ly, . <» Zne Write this polynomial as ọ (z). Its coefficients 
will consequently be polynomials in 2, Zə ..., £n. For these 
coefficients the theorem has already been proved, that is to say, each 
of them can be uniquely decomposed into a product of irreducible 
factors. Let us call ọ (z) a primitive polynomial (more exactly, pri- 
mitive over the ring P [x,, 22, . .-, xnl), if its coefficients do not con- 
tain a single common irreducible factor, that is to say, are all rela- 
tively prime, and let us prove the following lemma (Gauss’ lemma). 

The product of two primitive polynomials is itself primitive. 

Indeed, suppose we have the primitive polynomials 


f(z) = ag? + aga +... taj t+... + ay, 
g (a) = bot! + byt... tow H... +, 
with coefficients from the ring P [2,, Za, ..., 2) and let 
T(z) g(x) = cor" tee to eiga tath tena 


If this product is not primitive, then the coefficients cg, cy, ... 

-» Cp41 Will have a common irreducible factor p = p (x1, Zas .-. 

, zn). Since all the coefficients of the primitive polynomial 

f (x) cannot be divisible by p, let the coefficient a; be the first that 

is not divisible by p; similarly, by b; denote the first coefficient of 

the polynomial g (z) that is not divisible by p. Multiplying f (x) 
and g (x) termwise and collecting terms in 2**!-“+), we get 


Cit j= ibj +F libis + QingDjzn H -o o +14 105-1 + GinedDj-o + ..- 


The left member is divisible by the irreducible polynomial p. All 
terms of the right member (except the first) are also definitely divi- 
sible by p. Indeed, by the conditions imposed on the choice of i and 
j, all coefficients a;_,, a;-,, ..., and also bj-4, bi-as ... are divi- 
sible by p. From this it follows that the product a;b; is also divisible 
by p and therefore, as noted above, at least one of the polynomials 
a;, b; must be divisible by p, which however is not the case. This 


20* 
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completes the proof of the lemma, under the assumption that the 
fundamental theorem for polynomials in n unknowns holds true. 

As we know, the ring P [z,, z} . . . £n] is contained in the field 
of rational fractions P (z1, £}, ..., zn) which we will denote by Q: 


Q = P (zi, Za, ed Ap Zn) 


Let us consider the polynomial ring Q [z]. If the polynomial ọ (z) 
belongs to this ring, then each coefficient of it can be represented 
as a quotient of polynomials from the ring P [z}, £a, ..., al. 
Taking out the common denominator of these quotients and then 
oa the common factors from the numerators, we can represent 
@ (x) as 


9 (2) = + f(2) 


Here, a and b are polynomials of the ring P [z,, Z}, ..., zn] and 
f (x) is a polynomial in z with coefficients from P [2, Zo, ..., In]; 
it is even a primitive polynomial since its coefficients do not have 
common factors. 

In this way, we associate with every polynomial ọ (z) of the ring 
Q [x] a primitive polynomial f (z). For the given polynomial g (zx), 
the polynomial f (zx) is defined uniquely to within a nonzero factor in the 
field P. Indeed, let 

e@=Fh@=7al) 
where g(x) is again a primitive polynomial. Then 
adf (x) = beg (z) 

Thus, ad and be are obtained by taking out all common factors from 
the coefficients of one and the same polynomial over the ring P [z;, 
Zo, . ++; In]. Whence it follows, due to the validity, in this ring 
(on the induction hypothesis), of the unique factorization theorem, 
that ad and bc can differ only by a factor of degree zero. Hence, the 
primitive polynomials f (x) and g (x) differ by the same factor. 

The product of two polynomials from the ring Q [zx] is associated 
with the product of the primitive polynomials corresponding to them. 
Indeed, if 


e@=Fh(2), Y) = 7g (2) 
where f(z) and g (z) are primitive polynomials, then 
p (2) p (2) =Z F (2) g (2) 


But, as was proved above, the product f (z) g (x) is a primitive poly- 
nomial. 
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Furthermore, note that if the polynomial ọ (x) from the ring Q [z] 
is irreducible over the field Q, then the corresponding primitive polyno- 
mial f (x), regarded as a polynomial in x, £i, Zo. . «, In, is also irredu- 
cible, and conversely. Indeed, if the polynomial f is reducible, f = 
= fif,, then both factors must contain the unknown z, since other- 
wise the polynomial f would not be primitive, whence follows the 
decomposition of the polynomial ọ (z) over the field Q: 


(2) =F i (a) = (5 h) fe 


Conversely, if the polynomial ọ (xz) is reducible over Q, ọ (z) = 
1 (x) Pa (x), then the primitive polynomials f, (x) and f, a corre- 
sponding to the polynomials q, (x) and @z (z), will both contain z, 
but their product, as was proved above, is equal to f (x) (to within 
a factor from the field P). 

Now let us take a primitive polynomial f and factor it into irredu- 
cible factors, f = f,-f,. . . fpe Not only must all these factors contain 
the unknown z, they will even be primitive polynomials, for other- 
wise the polynomial f would not be primitive. This factorization of 
the primitive polynomial f is unique to within factors from the field P. 
True enough, due to the preceding lemma, we can regard this facto- 
rization as a factorization of f (x) into irreducible factors over the 
field Q, but we already know of the uniqueness of factorization of 
polynomials in one unknown over some field; this uniqueness occurs 
to within factors from Q. However, in our case, due to the primitivi- 
ty of all factors f;, it will be to within factors from P. 

After these lemmas, proved by induction, the proof of our funda- 
mental theorem does not present any difficulties. Indeed, any irre- 
ducible polynomial in the ring P [z, Zi, Za ..., In] will either 
be an irreducible polynomial from the ring P [zi 2, .. 

, £n] or an irreducible primitive polynomial. From this it follows 
that if we have some factorization of the polynomial ọ (z, zi, ro, ... 
- +, Zn) into irreducible factors, then, by combining factors, we 
can represent @ as 


P (T, Ti, Zor i ag = A (Zi, Zos ee YF (2, Ti, as 2G) 


where a is independent of z, and f is a primitive polynomial. However, 
we know that this factorization of ọ is unique to within factors from 
P. On the other hand, since for the polynomial a in n unknowns the 
uniqueness of factorization into irreducible factors holds by the 
induction hypothesis, and, for the primitive polynomial f, was pro- 
ved in the preceding lemma, the proof of our theorem for the case 
of n + 4 unknowns is also complete. 

An interesting corollary stems from the lemmas proyed above: 
if a polynomial ọ (x) with coefficients in P z1, £, . . ., zn] is reducible 
over the field Q = P (x1, £, . . ., Zn) then it can be factored into factors 
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dependent on x and having, as coefficients, polynomials from the ring 
P [x, £3, ..-, Zn]. Indeed, if to the polynomial @ (x) there corre- 
sponds a primitive polynomial f (x), that is, @ (x) = af (x), then, as 
we know, the factorability of f (x) follows pos the factorability 
of ọ (z). But this latter fact leads to the factorization of ọ (x) over 
the ring P lzi, £, ..., al. 

In contrast to the case of polynomials in one unknown, which, as 
we know from Sec. 49, can be factored into linear factors over an 
appropriately chosen extension of the base field under consideration, 
there exist over any field P absolutely irreducible polynomials of arbi- 
trary degree in several (two or more) unknowns, that is to say, polyno- 
mials that remain irreducible under any extension of the field. 

Such, for instance, is the polynomial 


f(z,y) = 9) +y 
where ọ (z) is an arbitrary polynomial in one unknown over the 
field P. Indeed, if there were a factorization 


_f@y =e y) h (z, y) 
in some extension P of the field P, then, by writing g and k in terms 
of powers of y, we would have, say, 


g (z, y) = a (z) y + a (z), h (z, y) = bo (2) 
that is, h is not dependent on y; and then, because ao (x) bo (x) = 
we would have that bọ (x) has degree 0, i.e., h is not Bie 
on z either. 

Alphabetical order of the terms of a polynomial. For polynomials 
in one unknown, we have two natural ways of arranging the terms — 
-as descending and ascending powers of the unknown. This is not 
possible for polynomials in several unknowns. If we have a polyno- 
mial of degree five in three unknowns, 


f (i, Zo £3) = qre? + airs + ri + diras 
it may also be written as 
Í (t1, To, T3) i x23 + LiTot3 zep LLT + ait, 
and there is no reason to prefer one notation to the other. There is, 
however, a very definite way of ordering the terms of a polynomial 
in several unknowns; it depends incidentally on the manner in which 
the unknowns are numbered. For polynomials in one unknown it 


reduces to ordering the terms in descending powers of the unknown. 
It is known as the alphabetical method. 


Suppose we have a polynomial f (x, Za, ..., Zn) in the ring 
P Izi, xg, . . +) Zn] and two distinct terms of the polynomial 
righe... hn (1) 


wage... xin (2) 
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whose coefficients are certain nonzero elements of P. Since the 
terms (1) and (2) are distinct, at. least one of the differences of the 
exponents on the unknowns 


ki — li, a ae oe eg N 


is nonzero. Term (4) will be considered higher than term (2) [and 
term (2) lower than term (1)] if the first of these differences 
(nonzero) is positive, that is, if there is an i, 1<i<n, such 
that 


ky = li, ka = bas e. e»; kimi = liis but ki > l 


In other words, term (1) will be higher than term (2) if the exponent 
on z, in (1) is greater than in (2), or if these exponents are equal but 
the exponent on z, in (1) is greater than in (2), and so forth. It will 
readily be seen that from the fact that term (4) is higher than term 
(2) it does not follow that the degree of the former (all unknowns 
taken together) is greater than that of the latter: of the terms 


3 52 
Lilo, Ly XyLy 


the first is higher though it is of lower degree. 

It is obvious that of any two distinct terms of the polynomial 
Í (a1, Za . . ., Zn), one will be higher than the other. It is also easy 
to verify that if term (1) is higher than term (2), and (2), in turn, is 
higher than the term 


smem., mn 3) 
that is, there exists a j, 1 <j <n, such that 
l = Ma4, l= Mo, e. lja = Mj-4, but l; > Mj 


then, irrespective of whether i is greater than, equal to, or less than 
j, term (4) will be higher than term (3). Thus, placing first that term 
which is higher, we get a definite ordering of the terms of the poly- 
nomial f (ti, Za ..., Zn), which is called alphabetical. 

Thus, the polynomial 


Í (Zi, Zas Lg, 24) = zi + Sajxprs — Titta + Sxyxgry + 2ta FHT t, — 4 


is arranged in alphabetical order. 

In the alphabetical notation of the polynomial f (z1, 22, . . . £n) 
one of its terms will occupy first place, that is, will be higher than 
any of the others. This term is called the highest term of the polynomial, 
in the example given above, zł is the highest term. We will now prove 
a lemma concerning highest terms; it will be used in the proof of the 
fundamental theorem of the next section. 

The highest term of a product of two polynomials in n , unknowns is 
equal to the product of the highest terms of the factors. 
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Indeed, suppose we are multiplying the polynomials f (z;, Tar eee 
Ser Ale) and E (Zi, Lor ee 2 TER 
axhiche .., an (4) 
is the highest term of the polynomial f (zi, z,..., Zn), and 
Ea ... gin | (5) 
is any other term of this polynomial, then there is ani, 1 <i<n, 


such that 
ky = Sty o 0 09 ki-i = Sini ki > 8; 


If, on the other hand, 


brlaxiz ... xin (6) 
b'xiri... cin | (7) 


are the highest term and any other term of the polynomial 
g (Zi, Za, .-., Zn), then there is a j, 1 <j <n, such that 


l a tay > ; °} lii = bja lj > t; 


Multiplying the terms (4) and (6) and a the terms (5) and (7), 
we get . 
abalsttighete.,.gkntin, = (8) 


a’b’aytiagtte aohia © 9 


It is easy to see, however, that term g is higher than term (9); 
if, say, i<j, then 


kı + = Si + li, ee E i -+ lhe 1 = = Sind + ti-i but 
Ky ra l > s; F t 


since k; > si, li >i In the same way, we see that term (8) is higher 
than. the product of the terms (4) and (7), and also higher than the 
product of the terms (5) and (6). Thus, term (8)—the product of 
the highest terms of the polynomials f and g—will be higher than 
all other terms obtained by termwise multiplication of the polyno- 
mials f and g, and so this term does not vanish when we collect terms; 
that is to say, it remains the highest term in the product fg. 


52. Symmetric Polynomials 


Conspicuous among polynomials in several unknowns are those 
that remain unchanged no matter what rearrangements of the un- 
knowns occur. Thus, all unknowns appear in these polynomials in 
symmetric fashion, whence the name symmeiric polynomials (or 
symmetric functions). Among the simplest examples are the sum of 


52. SYMMETRIC POLYNOMIALS 313 


all unknowns 2, -++ tt ... + zn, the sum of the squares of 
the unknowns zi-+a237+...+2%, the product of the unknowns 
Zilo.. Ln, and so on. Since any permutation on n symbols can 
be represented in the form of a product of transpositions (see 
Sec. 3),.it is sufficient, when proving the symmetry of a poly- 
nomial, to verify that it remains unchanged under any transposition 
of two unknowns. 

We shall now consider symmetric polynomials in n unknowns 
with coefficients from some field P. It is easy to see that the sum, diffe- 
rence and product of two symmetric polynomials are symmetric; that 
is to say, symmetric polynomials form a subring in the ring 
P (x4, £2, . . -, Zn] of all polynomials in n unknowns over the field P; 
this is called the ring of symmetric polynomials in n unknowns over the 
field P. It includes all elements of P (that is, all polynomials of deg- 
ree zero and also zero), since they definitely do not change under any 
rearrangement of the unknowns. Any other symmetric polynomial 
invariably contains all n unknowns and even has one and the same 
degree with respect to them: if a symmetric polynomial f (£1, x2, .. . 

-» ®%) hasa term in which the unknown z; appears with an expo- 
nent k, then it also has a term obtained from ‘the first one by a tran- 
sposition of the unknowns z; and zj, that is, one containing the un- 
known x; to the same power k. 

The following n symmetric polynomials in n unknowns are called 
elementary symmetric polynomials: | 


(Oy = yh eg he Tony 


Og = TiTa + 1423 i. -+ Diii : 

O3 = Llata F Tot, 4+ ao i FH Dyce ils ; (1) 
On-1 = Tig.» » En-4 E Lifa. . « Ln-gln F.. e + Lye. kn, 
; On = E2 ` o Trn 


These ‘polynomials, whose symmetry is obvious, play a very great 
role in the theory of symmetric polynomials. They are suggested 
by the Vieta formulas (see Sec. 24) and so we can say that the coeffici- 
ents of a polynomial in one unknown, the leading coefficient of which is 
unity, will, to within sign, be elementary symmetric polynomials with 
respect to its roots. This relationship between elementary symmetric 
polynomials and the Vieta formulas will be very essential for those 
applications of symmetric polynomials to the theory of polynomials 
in one unknown which justify their study. 

Since symmetric polynomials in n unknowns zi, Za, ..., Zn 
over the field P constitute a ring, the following assertions are obvious: 
we have a symmetric polynomial in the case of any positive integer 
power of any one of the elementary symmetric polynomials, also in 
the case of a product of such powers (taken with any coefficient of P), 
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and, finally, in the case of any sum of these products. In other words, 
any polynomial in the elementary symmetric polynomials 04, Oz, .. - 
.. +) On With coefficients from P, which polynomial is regarded as a 
polynomial in the unknowns 2, to, ..., £n, will be symmetric. For 
example, set n = 3 and take the polynomial o,0, + 203. Replacing 
Gi, O and o; by their expressions, we get 


010, + 203 = xix, + rixg + rt} + tr -+ rr? + ra? + SayrQ73 


What we have on the right is obviously a symmetric polynomial 
in Zi, Ly, T3 

An inversion of this result is the following fundamental theorem 
on symmetric polynomials. 

Any symmetric polynomial in the unknowns zı, £a, .. ., Ln over 
the field P is a polynomial in the elementary symmetric polynomials 
Ois Og, -. +» On With coefficients belonging to P. 

Indeed, suppose we have the symmetric polynomial 


f (2; Var e» oy Zn) 
and, in the alphabetical notation, let the highest term be 


Agrhirke ,.. ghn (2) 
The exponents on the unknowns in this term must satisfy the ine- 
qualities | 

ki > ky >... > kan (3) 


Indeed, suppose, for some i, we have k; < k;4,. However, since the 
polynomial f (z4, Za ..., Zn) is symmetric, it must contain the 
term | 
Righo Risag?t hn 4 
ALIIE -o e L HAT lg we Tn (4) 
which is obtained from term (2) by a transposition of the unknowns 
x, and z;+,. This is a contradiction, since term (4) is higher than term 
(2) alphabetically: the exponents on 2, Za, ..., 2-1 coincide in 
both terms, but the exponent on z; in term (4) is greater than in 
term (2). 

Let us now take the following product of elementary symmetric 
polynomials [all exponents will be nonnegative because of inequali- 
ties (3)]: 

Py = dgohi—haghe—Rs |., gkn-1—Rngkn (5) 


This is a symmetric polynomial in the unknowns 2%, 2%, .. +; In; 
and its highest term is equal to term (2). Indeed, the highest terms 
of the polynomials 04, O» Os, .--, On are equal, respectively, to 
Lis Lalas LzLoXg, - - +, Talg -> Zn, and since it was proved at the 
end of the preceding section that the highest term of a product is 
equal to the product of the highest terms of the factors, it follows 
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that the highest term of the polynomial q, is 


hy—hk ho-k hg—k hn-1—k h 
datt (@yho) (hots) Ooo (Emret) aig ss Ea) 
= agrhigh ghn 


From this it follows that when we subtract q, from f, the highest 
terms of these polynomials cancel out, that is, the highest term of the 
symmetric polynomial f — p, = fı will be lower than the term (2), 
which is the highest one in f. Repeating this same procedure for the 
polynomial fı, whose coefficients obviously belong to the field P, 


we get the equality 
fr = tf 


where g, is the product of the powers of elementary symmetric 
polynomials with a coefficient in P, and f, is a symmetric polynomial 
whose highest term is lower than the highest term in fı, whence 


the equality 
f= + Othe 


Continuing this process, we get f = 0 for some s and therefore 
arrive at an expression of f in the form of a polynomial in 641, Oa, ... 
. +) On With coefficients in P: 


8 
Í (Eir Tar -+ -3 En) = Èi P= P (Or ay +++ On) 


Indeed, if this process were endless,* we would obtain an infinite 
sequence of symmetric polynomials; 


fi fas Fooss fs» euwe (6) 


and the highest term of each would be lower than the highest terms 
of the preceding polynomials, and all the more so lower than (2). 
However, if 

bæir! ... xn (7) 


is the highest term of the polynomial f,, then from the symmetry 
of this polynomial there follow the inequalities 


L>... >l (8) 
which are similar to the inequalities (3). On the other hand, since 
term (2) is higher than term (7), it follows that 

ki >l (9) 
* One must bear in mind that, generally speaking, the polynomial Ọs 
also contains terms not found in the polynomial f;., and therefore the transi- 


tion from fs-1 to fs = fs-1 — Ps is connected not only with eliminating certain 
terms from f,-, but also with the appearance of new terms. Here, s = 1, 2,.... 
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It is readily seen, however, that the systems of nonnegative integers 
li la ..., In which satisfy the inequalities (8) and (9), may be chosen 
in only a finite number of ways. Indeed, even if we give up the requi- 
rement (8) and only assume that all /;, i= 1, 2, ..., n, do not 
exceed k,, then the choice of numbers 7; will be possible in only 
(kı + 1)" ways. Whence it follows that the sequence of polynomials 
(6) with strictly descending highest terms cannot be infinite. 

This completes the proof of the theorem. 

The above-indicated relationship between elementary symmetric 
polynomials and the Vieta formulas permits deriving the following 
important corollary from the fundamental theorem on symmetric 
polynomials. 

Let f (x) be a polynomial in one unknown over the field P having the 
leading coefficient unity. Then any symmetric polynomial (with coeffici- 
ents from P) in the roots of the polynomial f (x), which roots belong to 
some splitting field of the polynomial f (x) over P, will be a polynomial 
(with coefficients from P) in the coefficients of the polynomial f (x) and 
therefore will be an element of P. 

The foregoing proof of the fundamental theorem also provides 
us with a practical method for finding the expressions of symmetric 
polynomials in terms of elementary polynomials. Let us first intro- 
duce the following notation: if ; 

axhight ,,, hn (10) 
is some product of powers of the unknowns x, Za, ..., % (some 
of the exponents may be equal to zero), then 


S (axtixh? ... xin) (11) 


will denote the sum of all terms obtained from (10) by all possible 
rearrangements of the unknowns. It is obvious that this will be a sym- 
metric polynomial and homogeneous too, and that any symmetric 
polynomial in n unknowns containing the term (10) will also contain 
all the other terms of the polynomial (11). For example, § (zı) = 
= 04, S (242) = Oz, S (x?) is the sum of the squares of all the 
unknowns, etc. 

Example. Express the symmetric polynomial f = S (z?r2) in n unknowns 


in terms of the elementary symmetric polynomials. 
Here, the highest term is z}z. and therefore p; = 0of0; = 0402, that is, 


Pi = (z, + tebe. H Tn) (tar + gaza +» o + tpn) 
= S (x}xe) + 3S (£12223) 
whence 
fi = f — 91 = —3S (zız223) = —303 
Therefore, f = pı + fi = 0103 — 303. 


In more involved cases, it is advisable first to determine which terms can 
enter into the expression of the given polynomial via elementary polynomials, | 
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and: then to: find the coefficients of these terms by the method of undetermined 
coefficients. 

Example 1. Find an expression for the symmetric polynomial f = S (x2z3). 

We know (see the proof of the fundamental theorem) that the terms of the 
desired polynomial @ (04, O2, ..., On) are determined via the highest terms 
of the symmetric polynomials fı, f2, ..., these highest terms being lower 
than the highest term of the given polynomial f, that is, lower than x}x}3. We find 
all the products z!! z}? ... a!” that satisfy the following conditions: (1) they 
are lower than the term 2223, (2) they can serve as the highest terms of sym- 
metric polynomials, i.e., they satisfy the inequalities 4 >l,>... > ln, 
(3) with respect to all unknowns taken together they have the degree 4 (since, 
as we know, all the polynomials f1, f2, ... have the same degree as the homo- 
geneous Pon emin f). Writing out only appropriate combinations of expo- 
nents and indicating, alongside, those products of powers of o which products 
are determined by them, we get the following table: 


22000 ... 0?-202-0 = 03, 

21100... o}10}-10}-° = 0403, 

41110... of-4o}-loi-101~-° = 04 
Thus, the polynomial f has the form 

= 02 + A003 + Bo, 
We set the coefficient of o, equal to unity, since this term is determined by the 
highest term of the polynomial f and, as we know from the proof of the funda- 
mental theorem, has the same coefficient. The coefficients A and B are found 
as follows. . 
Set 2; = 22 = z3 = 1, m=... =2, = 0. It is easy to see that for these 

values of the unknowns the posynomial f has the value 3, and the polynomials 
Oi, O2, O3 and Oy, the values of 3, 3, 1, and 0, respectively. Therefore, 


3=9+A-3-1+B-0 
whence A = —2. Now put zı = zt = z3 = q, = Å, zp =... = Ip = 0. 
The values of the polynomials f, 61, 02, 03 and o, will be 6, 4, 6, 4, 1, respective- 
ly. Therefore, 


6 = 36 — 2-4-4-++ B1 
whence B = 2. Thus, for f the desired expression is 
f = of — 20403 + 20, 
Example 2. Find the sum of the cubes of the roots of the polynomial 
f(z) = z4 4 z3 +4 222+ 2+ 14 


To solve this problem, let us find the expression for the symmetric poly- 
nomial S (rf) in terms of the elementary symmetric polynomials. Applying 
the same method as in the preceding example, we get the table 


3000 ... of, 
2400... 0,92, 
4410... . 03 


and therefore 
l S (xf) = of + A0103 + Bos 


First assuming 2, = z = 1, z3 =... = z, = 0, and then z; = z, = 
= gm = 1, n =... = tn = 0, we get A = —3, B= 3, that is, 


S (28) = 0o} — 30,0, + 303 (12) 
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To find the sum of the cubes of the roots of the given polynomial f (z), it is 
necessary (because of the Vieta formulas) to replace, in the above-found expres- 
sion, 0; by the coefficient of 2° with sign reversed, that is, by —1, then to rep- 
lace oz by the coefficient of z?, that is, by 2, and, finally, to replace os by the 
coefficient of x with sign reversed, i.e., by —1. Thus, the sum we are interested 
in (the sum of the cubes of the roots) is equal to 


(—1)8 — 3.(—4)-2-+ 3(—1) = 2 


The reader can verify this result if he takes into account that f(z) has 
as roots the numbers i, —i, — + + 3 and + —i Ve -Itis also obvious 


that the formula (12) does not depend on the given polynomial f (z) and enab- 
les us to find the sum of the cubes of the roots of any polynomial. 


The method, obtained in the proof of the fundamental theorem, 
for expressing a symmetric polynomial f in terms of the elementary 
polynomials leads to a very definite polynomial in 64, Oz, ..., On. 
It turns out that there is no way of obtaining a different expression 
for f in terms of d4, Oa, ..., On. This is indicated by the following 
uniqueness theorem. 

Every symmetric polynomial has only a unique expression in the 
form of a polynomial in the elementary symmetric polynomials. 

Here is the proof. If a symmetric polynomial f (£i, £as .~. +; £n) 
over a field P had two distinct expressions in terms of 04, Og, . . «1 On 


f (x1, Tos e 8 0g Ln) = p (o,, Oo; e a %9 On) = %p (o,, Oo; o e oy On) 
then the difference 
% (Ois Oas >- +s On) = P (Ois Og, © © -3 On) — YP (04, Ozi © < -s On) 


would be a nonzero polynomial in 64, Oz, . . . On; that is, not all 
its coefficients would be zero, whereas replacing O4, Os, ..., On 
in this polynomial by their expressions in terms of z4, Za, . . ., In 
would lead to the zero of the ring P [z,, z3, ..., 2]. It therefore 
remains to prove that if a polynomial x% (041, Oz, ..., On) is diffe- 
rent from zero, that is, has at least one nonzero coefficient, then 
the polynomial g (zi, £z, ..., tn) obtained from y% by replacing 
Ois Oy, -> On by their expressions in terms of zi, Za, ..., In, 


X (01; Gay «+ + On) = E (Zis Zoa -s Zn) (13) 


is also nonzero. 

If aotoke ... otn is one of the terms of the polynomial y, 
a + 0, then after replacing all o by their expressions (1), we get 
a polynomial in z4, x, ..., Zn whose highest term (in the sense of 
alphabetical ordering) is, as we already know from the proof of the 
fundamental theorem, the term 


as’? (2,2)? ... (422. ty) = axial? ... xi? 
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where 

L = ki + k +... + kns 

l = ka -+ eee + kn, 

L = kn 
Whence 

ki = li — lias, ky = lni ix=1,2,...,n—1 

That is to say, using the exponents l4, la, ..., ln, we can restore the 
exponents kj, kas ..., kn of the initial term of the polynomial y. 
Thus, distinct terms of the polynomial y, which are regarded as 
polynomials in zi, Za, ..., Zn, have distinct highest terms. 


Let us now consider all the terms of the polynomial y: for each 
one of them let us find the highest term of its representation in the 
form of a polynomial in 2,,‘z,, ..., Zn and select that highest term 
which is highest in the alphabetical-ordering sense. As has been 
pointed out above, this term does not have any similar ones among 
the highest terms obtained from the other terms of the polynomial y, 
and since, by hypothesis, it is higher than each of these highest terms, 
it is all the more so higher than the other terms obtained when repla- 
cing in the terms of the polynomial y the elements o4, Os, ..., On 
by their expressions (1). We have thus found a term which, when pass- 
ing from y% (04, Og, >- . On) to g(x, Zos s». Zn), appears (with 
nonzero coefficient) only once and for this reason cannot be cancelled 
out witb anything in any way. Whence it follows that not all coeffi- 
cients of the polynomial g (z1, £2, ..., Zn) are equal to zero, that 
is, this polynomial is not a zero element of the ring P [x,, zg, .°. . 
. «+, Zn]. The proof is complete. 

Evidently, this theorem could also be stated in the following 
manner. 

A system of elementary symmetric polynomials Oi, Oo, .. +; On 
regarded as elements of the polynomial ring P [x,, £z, . . +; £n] is al- 
gebraically independent over the field P. 


53. Symmetric Polynomials Continued 


Remarks on the fundamental theorem. The proof of the fundamen- 
tal theorem on symmetric polynomials given in the preceding section 
admits of a number of essential supplements to the statement of the 
theorem. We will make use of them in what follows. First of all, 
the coefficients of the polynomial ọ (01, Cz, ..., On) which we found 
as an expression for the symmetric polynomial f (zi, Za, . ~~, Zn) 
in terms of the elementary symmetric polynomials not only belong 
to the field P, but are even expressed in terms of the coefficients of the 
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polynomial f by means of addition and subtraction, i.e., they belong 
to the ring L generated by the coefficients of the polynomial f inside the 
field P. 

True enough, all coefficients of the polynomial q, [see formula 
(5) of the preceding section] in the unknowns zi, Za, ..., Zn are, 
as will readily be seen, integral multiples of the coefficient ay of the 
highest term of the polynomial f and for this reason belong to the 
ring L. Let it be already proved that L contains all coefficients (in 
Zi, Lg, . ++, Ly) of the polynomials qi, ge, ..., @;. Then the coeffi- 
cients of the polynomial fı = f — gi — 9, — . — p will also 
belong to L, and therefore L contains all coefficients of the poly- 
nomial ;4; in 2%, 22, .. 


On the other hand, the de of the polynomial 9 (61, Gg, . - . Gn) 
with respect to 01, Og, . . ., On taken together is equal to the degree of 
the polynomial f (£i, Zas, . «+ In) with respect to each of the unknowns 


xı. Indeed, since (2) of Sec. 52, is the highest term of polynomial f, 
it follows that k, will be the degree of f in the unknown 2, and there- 
fore, by symmetry, in any other of the unknowns z; as well. However, 
the degree of p, with respect to o jointly is, by (5) of Sec. 52, equal 
to the number 


(ky — ka) + (kz — k3) +... + (kne — kn) + kn = ki 


Furthermore, since the leading term of the polynomial f; is lower 
than the leading term of the polynomial f, it follows that the degree 
of f, with respect to each one of the z; will not exceed the degree of 
f with respect to each one of these unknowns. However, for f, the 
polynomial @, plays the same role as q, for f, and so the degree of 
> with respect to o jointly is equal to the degree of f, with respect 
to each one of z;; that is, it does not exceed k, and so on: Thus, like- 
wise, the degree of @ (04, Oz, . . -, On) does not exceed k,. But since 
no, with i > 1 can contain all 04, Ca, ..., On to the same powers 
as pı, the degree of @ (Oi, Oo, .. -; On) is exactly equal to k,. Our 
assertion is thus proved. ; 

Finally, let aolot ... oln be one of the terms of the poly- 
nomial @ (04, Ge, .. +; On). We give the name “weight” of this 
term to the number 


Lt Ba E A 


that is, to the sum of the exponents multiplied by the indices of the 
corresponding o;. In other words, this is the degree of our term with 
respect to the unknowns zi, Za, ..-, Zn taken together, as follows 
from the theorem (proved in Sec. 51) on the degree of a product of 
polynomials. Then the following assertion holds true. 

If, with respect to the totality of unknowns, a homogeneous symmet- 
ric polynomial f (£i, £a, .--, £n) has degree s, then all terms of its 
expression @ (01, Og, - > -» On) via o will have the same weight equal to s. 
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Indeed, if (2) of Sec. 52 is the highest term of the homogeneous 
polynomial f, then 


s = ki + ka +... + in 
However, the weight of the term q, is, by (5) of Sec. 52, equal to 
(Hey — ka) + 2 (ka — ka) +... + (r — 1) (keni — Kn) + hy 
= ki + ka + ky +... +n 


That is, it is also equal to s. Furthermore, the polynomial f, = f — 
— pı, being the difference of two homogeneous polynomials of degree 
s, will itself be homogeneous of degree s, and therefore the term g, 
of the polynomial ọ will have weight s, etc. 

Symmetric rational fractions. The fundamental theorem on 
symmetric polynomials can be extended to the case of rational frac- 


tions. Let us call the rational fraction Å in n unknowns Bis Rega xs 


. 2+) Zn Symmetric if it remains equal to itself under any rearratige- 
ment of the unknowns. It is easy to demonstrate that this definition 


does not depend on whether we take the fraction £ or an equival::.t frac- 


tion h, Indeed, if œ is some arrangement of our unknowns, and @ 


is an arbitrary polynomial in these unknowns, then let us agree to 
use @® to denote the polynomial into which ọ is carried by the arran- 
gement œw. By hypothesis, for any a, 


£ 
That is, fg® = gj®. On the other hand, from 


tote 
8 80 


it follows that fgo = gfo, whence f®g2 = g®f®. Multiplying both 
sides by f, we get 


ff°a? = fgl f? = eff? 
ee by cancelling out f°, it follows that fg? = he or 


R _i_h 
ge g 8o 
The following theorem is valid. 
Any symmetric rational fraction in the unknowns zi, £a, ..., Zn 
with coefficients from the field P can be represented as a rational fraction 
in the elementary symmetric polynomials Oi, Oa, . . ., On with coeffi- 


cients which again belong to P. 


21 —5760 
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Indeed, suppose we have the symmetric rational fraction 


f (4, 2s sacs Zn) 
8 (z1, Tos sary Zn) 


Assuming it to be in lowest terms, we could prove that both f and g 
are symmetric polynomials. However, a simpler way is the following. 
If the polynomial g is not symmetric, multiply the numerator and 
the denominator by the product of all n! — 1 polynomials obtained 
from g under all possible nonidentical permutations of the unknowns. 
It is easy to check that the denominator will now be a symmetric 
polynomial. From this it follows, by the symmetry of the entire 
fraction, that the numerator will now also be symmetric, and so to 
prove the theorem all we have to do is express the numerator and the 
denominator in terms of the elementary symmetric polynomials. 

Power sums. In applications we often encounter the symmetric 
polynomials 


=z? 4 z+ ... +2, k=1, 2,... 


which are sums of the kth powers of the unknowns zi, £a, ... 
These polynomials, called power sums, must be expressed (by the 
fundamental theorem) in terms of elementary symmetric polynomials. 
However, for large k, it is extremely difficult to find these expres- 
sions, and so of interest is the relationship between the polynomials 
Sis Sg, ... and Oi, Og, ..., On, Which we will now establish. 

First of all, s; = o,. Next, if k < n, then it is easy to verify the 
truth of the following equalities: 


k-i 
Sk-101 = Sk + S (24 T3),* 


k—i -2 
Skh-203 = S (L4 Le) +S (zi Tots), 


Skai = S (wit xy... ti) HS (etry... Lliga) (1) 
2<i<k—2 


eo oe a ù © © © ù ù © a >ù ù @ © ò% © ù à è © ù ù y @ o 


§,;OR~; = S (xix, tans poy) + ko, 


Taking the alternating sum of these equalities (that is, the sum 
with alternating signs), and then transposing all terms to one side, 
we get the following formula: 


Sp — Sp-104 + SpagQdg ~... + (—1)""1s,0;,04 + (—1)* ko, = 0 (2) 
(k <n) 


* See (11) of Sec. 52. 
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But if k > n, then the system (4) of equations takes the form 


k- 
Sh-101 = Sh H S me a 


. © o» ù © @ o © © č a © © è» © o > © ù 8 ù © & 2 o ò že 


s0 = S (at itty, eee ai) +S (x7 EA e.o Lin), 2[Li<Kn— i, 


e. . s.» s ù o ù o o o o ù o o o o y o o o y o a o y o 


k- 
Sr-nOn = — S (xi ntig 


whence follows the po 
ho — Sp-t04 + Sh-203 — ~~~ + (—1)" non =O (k>œn) (3) 


Formulas (2) and (3) are called Newton’s formulas. They connect 
power sums with elementary symmetric polynomials and permit one 
to find, successively, the expressions for sı, Ss, S3 ... in terms of 

G1, Oo, -> +») On. Thus, we know that sı = o,, which also follows from 
formula (2). Furthermore, if k =2<n, then, by (2), s — syo, + 
+ 20, = 0, whence : 


S = of — 20, 


For k = 3 <n we have s} — s,0, + s,0, — 303 = 0, whence, using 
the expressions already found for sı and s, we get 


s3 = 0? — 30,0, + 303 


which is already familiar to us [see (12) of Sec. 52]. Now if k = 3 
but n = 2, then, by (3), s3 — S201 + 0, = 0, whence s; = o? — 
— 30,0,. Using the Newton formulas, we can obtain a general for- 
mula expressing sp in terms of 0;, Ug, ..., On. True, this formula 
is very unwieldy and so we will not give it. 

If the base field P has characteristic 0 and for this reason division 
by any natural number n is meaningful*, then formula (2) permits 
successively expressing the elementary symmetric polynomials 
61, Og, ..-, On in terms of the first n power sums sj, Sa, ..., Sn- 
Thus, o, = sı and therefore 


0, = (60s) .(si— Sa), : 
03 = + (83 — S201 + 8102) = + (81 — 388, + 283) 
and so forth. From the foregoing and from the fundamental theorem 


follows the result that 


* In a field of characteristic p, the expression = is meaningless for a = 0 
since in this field px = 0 for any z. 


21+ 


324 CH. 11, POLYNOMIALS IN SEVERAL UNKNOWNS 


Any symmetric polynomial in nm unknowns. zi, £y, .. ., Zn over 
a field P of characteristic zero can be represented as a polynomial in 
the power SUMS S1, So, - - -, Sn With coefficients belonging to the field P. 
Polynomials symmetric in two systems of unknowns. In the next 
section, and also in Sec. 58, use will be made of a generalization of 
the concept of a symmetric polynomial. Suppose we have two sys- 
tems of unknowns 2, 2%, ..., Zn and yi, Yo, ---, Yr, and suppose 
their union 
Eis Mie osa Ins Yis UR scr Yr (4) 


is algebraically independent over the field P. The polynomial 
f (Zis Zas + + +) Ino Yis Yor -+ +» Yr) over the field P is called symmet- 
ric in two systems of unknowns ìf it remains unchanged under any arran- 
gements of the unknowns x, £z, ..., Zn among themselves and of 
the unknowns yi, Yo, ---, y, among themselves. If we denote the 
elementary symmetric polynomials in 2, £a, ..., Zn by 01, Og, ... 
...; 6, and the elementary symmetric polynomials in y4, yo, ... 
<... Yr DY Ti, Te, . - +, Tr then the fundamental theorem is genera- 
lized as follows. 

Any polynomial f (£i, Za, . > -, Lay Yis Yor - - +» Yr) over the field 
P, which polynomial is symmetric with respect to the systems of unknowns 
Zis Loy.» +3 Zn ANA Yis Yos - + -+ Yr, can be represented as a polynomial 
(with coefficients from P) in the elementary symmetric polynomials 
with respect to these two systems of unknowns: 


f (ta Tas s+ +9 Tns Yis Yor + + +9 Yr) = oq (04, Oo, e e +» On, Ti, Toe.. tr) 


Indeed, the polynomial f may be regarded as a polynomial 


F(Y Yor «+ +> Yr) with coefficients which are polynomials in x, 
Io, - ++, Zn. Since f remains unchanged under rearrangements of the 
unknowns 21, £a, - - +» Zn, it follows that the coefficients of the poly- 


nomial f will be symmetric polynomials in 1, z,,...., Zn and the- 
refore, by the fundamental theorem, can be represented as polyno- 


mials (with coefficients from P) in 064, Gg, ..., On. On the other 
hand, the polynomial f (y1, Yz ---, Yr) regarded over the field 
P (x4, Lg, . - +» Zn) Will be symmetric with respect to y1, Yo, - + +» Yr 


and therefore can be represented as the polynomial P (Ti, Te, . - 


..; T,). The coefficients of the polynomial @ will, as was demonstra- 
ted at the beginning of this section, be expressed in terms of the coeffi- 


cients of f by means of addition and subtraction, and so they too will 
be polynomials in 01, Og, - .-; On. This obviously leads us to the 
desired expression for f in terms of 64, O2, - . -, On, Ti, Tar - + -» Tre 


Example. The polynomial 
Í (x4, £2, Z3, Yar Y2) = TiT233 — T4221 — T1£2Y2 — TXsy1 — TTsye 
— T2f3y1 — Lexayo + T1y1y2 + Teysye + Tsy1Y2. 
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is symmetric both with respect to the unknowns z1, z2, z3 and to the unknowns 
Y1, Y2, but is not symmetric with respect to the five unknowns taken together, 
as is evident from, say, a transposition of the unknowns z; and y,. Let us find 
the expression for f in terms of 01, 02, O3, Ti, Te: 


f = ayzerg — (2422 + aizg + Z223) ys — (Z123 + 2423 + T223) Yo 


+ (z1 + 22 + z3) yiya = O3 — Soy: — Saye + Oiyiyz = O3 — Ont, + OT, 


The theorem just proved can naturally be extended to the case 
of three or more systems of unknowns. i 

For polynomials symmetric with respect to two systems of un- 
knowns, the theorem of unique representation in terms of elementary 
symmetric polynomials also holds true. In other words, the follow- 
ing theorem is valid. 

The combined system 


1, Oo, ae eg On; Ti; Ta, oe og T; 


of elementary symmetric polynomials in the given systems of unknowns 


Zis Zos -> Zn ANd Yi, Yo, .- +, Yr is algebraically independent over 
the field P. 
Indeed, suppose over the field P there is a polynomial 
P (Ois Oz o.a, Ons Tis To +. T) 

equal to zero although not all its coefficients are zeros. This polyno- 
mial may be regarded as a polynomial p (Ti, Ta, ..., Tp} with 
coefficients which are polynomials in 641, Oz, ..., On. We can, con- 
sequently, take it that p is a polynomial in t4, To, ..., T, over the 


feld of rational fractions 
Q = P (24, Zos ..., Zn) 


The system y1, Yo, .-., Y, remains algebraically independent over 
the field Q: if, in this system, there were algebraic dependence with 
coefficients from Q, then, by eliminating the denominators, we would 
obtain an algebraic dependence in system (4), which contradicts the 
assumption. Proceeding from the uniqueness theorem of the prece- 
ding section, we now find that the system ti, Ta, ..., T, must also 
be algebraically independent over the field Q, and therefore all coeffi- 
cients of the polynomial p are equal to zero. However, these coeffi- 


cients are polynomials in 64, Os, . . ., On and therefore, again on the 
basis of the uniqueness theorem for the case of one system of unknowns 
(this time, the system 2, 22, ..., Zn), all coefficients of these latter 


polynomials are themselves zero. This proves that, in contradiction 
with the hypothesis, all coefficients of the polynomial @ must 
be zero. 
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54. Resultant. Elimination of Unknown. 


Discriminant 
If we have a polynomial f (2, x2, ..., Zn) from the ring 
P Izi, 2, . . ., Zn], then its solution is a set of values of the unknowns 
Ly = Qi, Ta = Ay, . . +) In = Qn 


taken in the field P or in some extension P of this field, a set that 
makes the polynomial f vanish: 


f (Cas Og, «+ an) = 0 


Every polynomial f of degree greater than zero has solutions: if the 
unknown 2, occurs in the notation of this polynomial, then for a,,... 
. ++) Q We can actually take any elements of the field P, provided 
only that the degree of the polynomial f (24, a, ..., @n) is strictly 
positive, and then, using the theorem on the existence of a root 
(Sec. 49), take an extension P of the field P in which the polynomial 
f (ti; Q2 ..., Œn) in the single unknown z, has the root a. At the 
same time, we see that the property of a polynomial of degree n 
in one unknown to have, in any field, not more than n roots ceases 
to hold true for polynomials in several unknowns. 

If we have several polynomials in n unknowns, we can pose the 
question of finding solutions that are common to all these polyno- 
mials; that is, solutions of the system of equations which is obtained 
by equating the given polynomials to zero. A particular case of this 
problem, namely the case of systems of linear equations, was consi- 
dered in detail in Chapter 2. However, concerning the opposite case 
of one equation in one unknown but of arbitrary degree, we know 
nothing about the roots except that they exist in some extension of 
the base field. Finding and studying solutions of an arbitrary non- 
linear system of equations in several unknowns is, quite understan- 
dably, a still more involved problem that goes beyond the scope of 
our present course and constitutes a special branch of mathematics 
known as algebraic geometry. Here, we confine ourselves to a system 
of two equations of arbitrary degree in two unknowns; we will show 
that this case can be reduced to that of one equation in one unknown. 

Let us first take up the question of the existence of common 
roots of two polynomials in one unknown. Suppose we have the poly- 
nomials 


f (2) = aa" + aya? + 2. + Ont + an, ) (t) 


g (x) = boz? + ae + e.. + bn -+ b; 


over the field P, a) = 0, bo = O. 
From the results of the preceding chapter, it readily follows that 


polynomials f (x) and g (x) have a common root in some extension of the 
field P if and only if they are not relatively prime. Thus, the question 
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of the existence of common roots of the given polynomials can be 
resolved by applying the Euclidean algorithm. 


We will now give another method. Let P be some extension of 
the field P in which f (x) has n roots a;, a, ..., @, and g (x) has s 


roots Ba, Bz, ..-, Bs; for P we can take the splitting field for the 
product f (x) g (x). The element 


R(t, a) = ab IT [I (x —Bs) 2) 


of the field P is called the resultant of the polynomials f (2) and g (x). 


It is obvious that f (x) and g (x) have a common root in P if and only 
if R (f, g) = 0. Since 


g (x) = by it (x —By) 


and therefore 
g (21) = By i (a: —B,) 
it follows that the resultant R (f, g) can also be written as 
n 
R(f, g)=a5 |] a (a) (3) 


The polynomials f (x) and g (z) are utilized in nonsymmetric 
fashion in determining the resultant. Indeed, 


| Ree, N=drag HU @—a=(—O" Rie) A 
In accordance with (3), R (g, f) may be written as 
Rie, )=oF L 76) (5) 


Expression (2) for a resultant requires a knowledge of the roots 
of the polynomials f (x) and g (x) and therefore is, in a practical sense, 
useless for solving the problem of the existence of a common root of 
these two polynomials. However, it turns out that the resultant 
R (f, g) may be represented in the form of a polynomial in the coeffi- 
cients Ay, Qi, . . -» An, bos by, . . ., bs of the polynomials f (x) and g (z). 

The possibility of such a representation follows readily from the 
results of the preceding section. Indeed, formula (2) shows that the 
resultant R (f, g) is asymmetric polynomial in two sets of unknowns: 
the set a, a, ..., &, and the set Bi, Ba, ..., Bs. Therefore, as 
proved at the end of the preceding section, it can be represented in 
the form of a polynomial in the elementary symmetric polynomials 
with respect to these two systems of unknowns, that is, by the Vieta 
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formulas, as a polynomial in the quotients i=, Bie ae a hs 
ag 


and pi »j=1,2,..., s, the factor ab? included in (2) eliminates 
0 


@ and by from the denominators of the resulting expression. Inciden- 
tally, it would be an arduous task to find the expression of the resul- 
tant in terms of the coefficients by means of methods described in the 
preceding sections, and so we will proceed differently. 

The expression for the resultant of the polynomials (1) that we 
will find will suit any pair of such polynomials. To be more precise, 
we will take it that the set of roots 


ais, Boy e oy Ons Bis Bo, .. e Bs (6) 
of the polynomials (1) is a set of n + s independent unknowns, that 
is, a set of n + s elements which are algebraically independent over 
the field P in the sense of Sec. 54. 

We will get an expression for the resultant, which expression, 
regarded as a polynomial in the unknowns (6) (after replacement of 
the coefficients by the roots via the Vieta formulas), will be equal 
to the right member of (2); this member is also regarded as a polyno- 
mial in the unknowns (6). 

Regarding the equality precisely in the sense of an identity 
in the set of unknowns (6), we will prove that the resultant R (f, g) of 
the polynomials (1) is equal to the following determinant of order n + s: 


Ay Oy... An 


Ay 4 an S rows 
lo Gy... an 
oe bo by... bs (7) 
bo bi... Os n rows 
by b; bs 


(all vacancies are occupied by zeros). The structure of this determi- 
nant is clear enough; it need only be noted that the coefficient ag 
appears s times on the principal diagonal and the coefficient b, occurs 
n times. 

To prove our assertion, we compute in two ways the product 
asbh DM, where M is the auxiliary determinant of order n + s 


n+s—1 pn+s—i n+s—i .n+s—i  nts—i n+s—1 
i B2 Dew Ba ant io 2 +e On 


peme pyre? oe prrs n-+s—2 ante? ne, n+s—2 


Oy An 
M ee a E A AR aE E ES A RE a o E OE ENA AERE ET a8 
2 2 2 2 2 
1 2 s Oy 
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M is the Vandermonde determinant and so it is equal (see Sec. 6) 
to the product of the differences of the elements of its second last row, 
any succeeding element being subtracted from any preceding ele- 
ment. Thus, 


M= _[]_ @:—6)- I Ü B —a): LT Ca) 
Sijs <i<j<cn 


and therefore, by (4), 
ab DM=D:R(g, f): I] @i—B)- a) 8) 


On the other hand, let us compute the product DM on the basis 
of the theorem on the determinant of a product of matrices. Multi- 
plying out the appropriate matrices and taking into account that all 
a are roots of f (x) and all B are roots of g (x), we get 


Bi F (Bs) Ba ‘f (Bs) - api FB) 0 0 .. 0 
-FB BaF (Ba) --- Br °F(B.) O O ne 0 
Paf (Bi) Baf (Ba) --- Bef (Bs) 0 0 an 0 
pm =| f(b) f (Ba) ~.. F (Bs) 0 0 wee 0 
0 0 nia O ai g (ay) a3 tg (a)... an g (an) 
0 0 sek 0 a1’ g (04) A27 g (ag)... an E (An) 
a ee ie ; a ne nen 
0 O  ... 0 g (a4) g(a) «+. 8 (Gn) 


Applying the Laplace theorem, then taking common factors out of 
the columns of the determinants and computing the remaining deter- 
minants as Vandermonde determinants, we obtain | 


oppDM =a T] 46), M, GiB) [I ee), Tee) 


i<i<j<s i= 1<i<jsn 
or, using (3) and (5), | 
DM =R(f a) RE N o ee Ha) O 


isi<jen 
We find that the right sides of (8) and (9), considered as polyno- 
mials in the unknowns (6), are equal. Both sides of the resulting equa- 
tion can be reduced by common factors not identically zero. The 
common factor R (g, f) is not equal to zero: since a) 0 and by Æ 0 
by hypothesis, it suffices to select for the unknowns (6) nonequal valu- 
es (in the base field or in some extension of it) in order to obtain from 


ry 
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(4) a nonzero value for the polynomial R (g, f). In the same way, we 
prove that the other two common factors are also different from zero. 
Cancelling out common factors, we arrive at the equality 

R (f, g) =D (10) 
which is what we set out to prove. ; 

Let us now give up the requirement that the idadi coefficients 
of the polynomials (1) be different from zero*. Concerning the true 
degrees of these polynomials, it is thus possible to assert only that 
they do not exceed their “formal” degrees n and, respectively, s. For 
the resultant, the expression (2) is now meaningless, since it may be 
that the polynomials in question have fewer roots than n or s. On the 
other hand, determinant (7) can be written now as well, and since it 
is already proved that for a) + 0, by = 0 this determinant is equal 
to the resultant, it follows that in our general case too we can call 
it the resultant of the polynomials f (x) and g (x) and denote it by 
R (f, 8). 

However, we can no longer hope that the fact that the resultant 
is zero is equivalent to our polynomials having a root in common. 
Indeed, if ag = 0 and bọ = 0, then R (f, g) = 0, irrespective of 
whether the polynomials f and g have common roots or not. It turns 
out, however, that this case is the only case when one cannot conclude 
that if the resultant is zero, the given polynomials have common 
-Toots**., Namely, the following theorem is valid. 

If we have polynomials (1) with arbitrary leading coefficients, then 
the resultant (7) of these polynomials is zero if and only if the polyno- 
mials have a common root or if their leading coefficients are both zero. 

Proof. The case of a) = 0, by s 0 has already been considered, 
and the case of ag = by = O is covered in the statement of the theo- 
rem. It remains to consider the case when one of the leading coeffi- 
cients of the polynomials (1), say ao, is nonzero and bọ is equal to zero. 

If b; = 0 for alli, i=0,1,..., s, then R (f, g) = 0 since the 
determinant (7) cohtains zero rows. In this case, however, the poly- 


nomial g (zx) is identically zero and therefore has common roots 
with f (z). However, if 


bo = by =... = bk- = 0, but br m k Ss 
and if 


g (x) = bya + ae oe + zhe Sb Bg it + b. 


* This temporary rejection of the condition on the leading coefficient 
of the polynomial, which was valid up to now, is.due to subsequent applica- 
tions: we want to consider systems : of polynomials in. two unknowns. and we 
want to regard one of the unknowns as a coefficient. Thus, the leading coeffici- . 
ent can vanish for particular values of this unknown. 

** The determinant (7) is of course also equal to zero when a, = b, = 0. 
However,. in this case the polynomials (1) have a common root 0. 


g 


54. RESULTANT. ELIMINATION OF UNKNOWN. DISCRIMINANT 331 


then, replacing the elements bo, bi, ..., br-4 in (7) with zeros and 
applying the Laplace theorem, we obviously get 


R (f, 8) = ÈR (f, 8) (44) 


But since the leading coefficients of both polynomials f and g are diffe- 
rent from zero, it follows, from what was proved above, that the 


equality R (f, g) = 0 is necessary and sufficient for the polynomials 
f and g to have a root in common. On the other hand, by (41), the 
equalities R (f, g) = 0 and R (f, g) = O are equivalent, and since 


the polynomials g and g of course have the same roots, we find that 
in the case at hand as well the fact that the resultant R (f, g) is zero 
is equivalent to the polynomials f (x) and g (z) having a common 
root. This proves the theorem. 

Let us find the resultant of the two quadratic polynomials 


f (x) = aoz? + az + ap, g(x) = boz? + biz + b, 
By (7), 
Qy A a, O 
0 Qo A, as 


(9: bo bi ba 
or, computing the determinant via expansion by the first and third 
TOWS, 
R (f, g) = (aob; = azb)? — (ab, — abo) (a,b, — gs) (12) 
Thus, if we have the polynomials 
f (y= z? — 6z + 2, g(t) =a? +245 


then, by (12), R (f, g) = 233 and so these polynomials do not have 
any roots in common. But if we have the polynomials 


f (£) = z? — 4z — 5, g(s) = 2? — Tz -+ 10 


then R (f, g) = 0; which means that they have a common root, the 
number a 

Eliminating an unknown from a system of two , equations in two 
unknowns. Suppose we have two polynomials f and g in two unknowns 
xz and y with coefficients from some field P. We write the polynomials 
in descending powers of z: 


f(z, y) = ao (Y) 2 + ay (y) 2 o F ana (U) £ + ar Cy), | 
g (x, y) = bo (y) x + by (y) t. nn + bm y) b u) 
(13) 
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The coefficients will be polynomials from the ring P [y]. We find the 
resultant of f and g, which are regarded as polynomials in z, and deno- 
te it by Rx (f, g). By (7) it will be a polynomial in the single unknown 
y with coefficients from the field P: 


Rx (f, 8) = F (y) (14) 


Let the system of polynomials (13) have, in some extension of 
the field P, the common solution z = a, y = P. Substituting the 
value p in place of y in (13), we get two polynomials f (z, P) and 
g (x, B) in the one unknown z. These polynomials have the common 
root œ and therefore their resultant, which by (14) is equal to 
F (PB), must be equal to zero, that is, B must be a root of the resultant 
R«(f, g). Conversely, if the resultant R, (f, g) of the polynomials 
(13) has the root Bf, then the resultant of the polynomials f (z, B) 
and g (zx, B) is zero. That is to say, either these polynomials have a 
common root or both their leading coefficients are zero, 


as (B) = bo (P) = 0 


The finding of common solutions of the system (13) of polynomials 
is reduced to the finding of roots of the single polynomial (14) in the 
single unknown y. We say that the unknown x has been eliminated 
from the system (13) of polynomials. 

The next theorem relates to the question of the degree of the poly- 
nomial which we obtain after eliminating one unknown from the 
system of two polynomials in two unknowns. 

If, taking the unknowns together, the polynomials f (x, y) and 
g (x, y) are respectively of degrees n and s, then the degree of the poly- 
nomial Ry (f, g) in the unknown y does not exceed the product ns, if, of 
course, this polynomial is not identically zero. 

First of all, if we regard two polynomials in one unknown with 
leading coefficients equal to unity, then, by (2), their resultant 
R (f, g) is a homogeneous polynomial in a, @, ..., Gn, Br, Bo - - 
..., Ba of degree ns. From this it follows that if the term 


atla? ... arb Ib? ... b! 
enters into the expression of the resultant via the coefficients 


lis Ag, ©: Gn, bis bg, ..., ba and if the weight of this term is the 
number 

ki + ky +... + ky + Ly + 21g +... tly 
then all terms of R (f, g) expressed via the coefficients have the same 


weight equal to ns. This assertion also holds true in the general case 
for terms of the resultant (7) if the number 


O-ko tikit... +k, HOU +4: +... +l, (15) 
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is given as the weight of the term aka... . aknbbobi, ... bls, 
Indeed, replacing the factors ao and by by unity in the terms of deter- 
minant (7), we arrive at the case that has already been considered; 
however, the exponents on these factors enter into (15) with coeffi- 
cients 0. . 

Now write the polynomials f and g as.follows: 


Í (z, y) = ao (y) 2” + ay (y) e+... + an (Y), 

g (x, y) = bo (y) z5 + by (y) wet +... + bs (y) 
Since n is the degree of f (z, y) in the unknowns jointly, the power 
of the coefficient a, (y), r=0, 1, 2,..., n, cannot exceed its index 
r; this holds true for b, (y) as well. Whence it follows that the degree 
of each term of the resultant R,, (f, g) does not exceed the weight of 
this term, which is to say it is not greater than the number ns. This 
completes the proof. 


Example 1. Find the common solutions to the following system of polyno- 


toials: 
f (x, y) = zy + 3zy + 2y + 3, 
g (z, y) = 2zy — 2z + 2y + 3 
Eliminate z from this system; to do this, rewrite it as 


f(z, y) = yx? + (3y) z + (2y + 3), } (16) 
g (z, y) = (2y — 2) z + (2y + 3) ; 


then . 
y 3y 2y +3]. 
Rx (f, 8) = | 2y — 2 2y+3 0 = dy? + ily + 12 
0 2y — 2 2y + 3 
The numbers fp, = —4, pp = — + will be the roots of the resultant. The leading 


coefficients of the polynomials (16) do not vanish for these values of the unknown 
y, and so each of them, together with some value for z, constitutes a solution 
of the given system of polynomials. The polynomials 


f(z, —4) = —4z? — 127 — 5, 
g (z, —4) = —10r —5 


have the common root ay=— 5. The polynomials 
3 3 9 
aoe Gites See ae Gee 

f (2 7) LOT, 


g (=, -3) = — ýr 


have the common root a = 0. Thus, the given system of polynomials has two 
solutions: : . : 
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Example 2. Eliminate one unknown from the system of polynomials 
f (z, y) = 2z3y — zy? + 2+ 5, 
g (x, y) = z?y? + 2zy? — 5y +1 


Since both polynomials are of degree 2 in the unknown y, whereas one of 
them is of degree 3 in z, it is advisable to eliminate y. Rewrite the system as 


f (z, y) = (—2) +y? + (22°)-y + (z + 5), } (47) 
g (x, y) = (z? + 2z) y? — 5y + 1 


and find its resultant, applying formula (42): 


Ry (f, 8) = [(—2)-4 — (z + 5) (z? + 22))? . 
— [(—z) (—5) — 223 (z? + 2z)] [223-4 — (x + 5) (—5)] 
= 428 + 827 + 1126 + 8425 + 16424 + 15425 + 962? — 1252 


One of the roots of the resultant is 0. However, for this value of the unknown 
xz, both leading coefficients of the polynomials (17) vanish; and, as is readily 
seen, the polynomials f (0, y) and g (Ô, y) do not have any common roots. We do 
not have any method for finding the other roots of the resultant. We can only 
assert that if we found them [say in the splitting field for Ry (f, g)], then not 
one of them would make both leading coefficients of the polynomials (47) va- 
nish, and therefore each of these roots, together with some value for y (one 
or aren several), would constitute a solution of the given system of polyno- 

mials. N 

There are also methods for successively eliminating the unknowns 

from systems with an arbitrary number of polynomials and unknowns. 

They are too involved however to be included in this course. 
Discriminant. By analogy with the question that led us to the 

concept of a resultant, we can ask about the conditions under which 

a polynomial f (z) of degree n from the ring P [z] has multiple roots. 

Let 


Í (x) = aoz” + aye} + woe + Ant + an, ao =~ 0 
and suppose that in some extension of the field P this polynomial 
has the roots a1, @:, ..., @n. It is obvious that there will be equal 
roots among them if and only if the following product is zero: 
A = (02 — Q4) (X3 — Q4) - - - (An — Q1) (X3 — A2) (A4 — Oy) .. « (An — Gq) 


. 8 © © > © © » © a © o 8 ò è © è © à è è > © s a a y n © © w 2 © o @ 


X (an —Gn-1) = [I (%1 — ay) 
nzi>jzi 
or, equivalently, if the product 
D=" [|] (@1—a;) 
nzi>jzi 
called the discriminant of the polynomial f (x) is zero. 
Unlike the product A, which can change sign upon a rearrange- 
ment of the roots, the discriminant D is symmetric with respect to 
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G4, 2a, ..-, Un and can therefore be expressed in terms of the coeffi- 
cients of the polynomial f (x). To find this expression, under the assu- 
mption that the field P has characteristic zero, we can take advanta- 
ge of the connection between the discriminant of the polynomial 
f (x) and the resultant of this polynomial and its derivative. It is 
natural to expect such a connection: we know from Sec. 49 that a po- 
lynomial has multiple roots if and only if it has roots in common with 
the derivative f’ (x) and therefore D = 0 if and only if R (f, f) = 0. 
By formula (8) of this section, 


Ri P= I] fe 
Differentiating j 
f (x) =a H (x — ay) 


we get 
f(z) =a > |] (e¢—a,) 
k=1 jÆk 


After substitution of œ; instead of z, all terms, except the ith, vanish 


and so 
f (ai) =a [| (@:— a) 
ji 


whence 
| R(f, f')=at-an it D a) 


For any i and j, i > j, two factors ar into this product: a; — Qj 
and a; — a;. Their product is equal to (—1)-(a; — a;)? and since 


there are” "=i pairs of indices i, j satisfying the inequalities n > 
>i >j 1, it follows that 


n(n—1) n(n 1) 


R(f, f)=(—1) 7 a-t [[_ (a) =(—1) © aD 


n>i>j>1 
Example. Find the discriminant of the quadratic trinomial 
f (2) = ax? + br + l 
Since f (£) = 2az + b, it follows that 


a be 
R (f, f) = | 2a b 0} =a (—b? + 4ac) 
0O 2a b 
In our case, in at and so 


D = —aR (f, f') = b? — 4ac 


This coincides with what school algebra calls the discriminant of a quadratic 
equation. 
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Another way of finding the discriminant is the following. Form 
a Vandermonde determinant from the powers of the roots a,, €z, ... 
. ++, Œn. AS indicated in Sec. 6, 


4 : ee | 
ay, Ae An 
2 2 2 
Qi Me An |= I (@;—aj)=A 
i>j> 
ara ant 


and so the discriminant is equal to the square of this determinant 
multiplied by a{""*. Multiplying this determinant by its transpose 
by the rule for matrix multiplication and recalling the power sums 
defined in the preceding section, we get 


n Sy Sg we Smt 
Sy Sg Sg ese & 
— „3n-2 - 
D = a| S3 S3 S ++ Satt (18) 


. 


Sn-4 Sn Sn +4 eee Son-g 
where s, is the sum of the kth powers of the roots a4, a2,..., Qn. 


Example. Find the discriminant of the cubic polynomial f (zr) = z? + 
+ az? +- br + c. By (48) 
3 s1 S2 
Si S2 83 
82 83 S 


D = 


As we know from the preceding section, 
% = 0 = —a, | 
s2 = of — 202 = a? — 2b, 
s3 = OF — 30,02 + 303 = —a® + 3ab — 3c 
Using Newton’s formula, we will also find that (because o, = 0) 
sı = 0} — 4oio, + 40403 + 203 = at — 4a?b + 4ac + 2b? 
Whence l 
D= 3505, + 2545983 — $ — sis, — 333 
= a?b? — 4b — abe 4- 18abce — 27c? (49) 
In particular, for a = 0, i.e., for an incomplete cubic polynomial, we obtain 
D = —4b? — 270 


in complete accordance with what was said in Sec. 38. 
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55. Alternative Proof of the Fundamental Theorem 
of the Algebra of Complex Numbers 


The proof of the fundamental theorem given in Sec. 23 was comple- 
tely nonalgebraic. Here we give another proof, which takes advanta- 
ge of an extensive algebraic apparatus: essential use is made of the 
fundamental theorem on symmetric polynomials (Sec. 52) and also 
of the theorem on the existence of asplitting field for any polynomial 
(Sec. 49). At the same time, the nonalgebraic portion of the proof 
is minimal and is reduced to a single simple assertion. 

First note that in Sec. 23 we proved a lemma on the modulus of 
the highest-degree term of a polynomial. Taking the coefficients of 
a polynomial f (x) to be real and putting k = 1, we obtain the follo- 
wing corollary of this lemma. 

For real values of x sufficiently large in absolute value the sign of 
a polynomial f (x) with real coefficients coincides with the sign of the 
highest-degree term. 

From this follows the result that 

A polynomial of odd degree with real coefficients has at least one 
real root. 

' Indeed, let 


f (x) = agr” + ar" H... F an 


and all coefficients be real. Because of the oddness of n, the highest- 
degree term aoz” has different signs for positive and negative values 
of z, and therefore, as was proved above, the polynomial f (zx) 
will also have different signs for positive and negative values of z 
sufficiently large in absolute value. There consequently exist real 
values of x, say a and b, such that 


f(a) <0, f(t) >0 


However, from the course of analysis we know that a polynomial 
(a rational integral function, that is) f (x) is a continuous function 
and for this reason, because of one of the basic properties of continu- 
ous functions, f (2) takes on any given value intermediate between 
f (a) and f (b) for certain real values of z between a and b. For 
example, there is an œ between a and b such that f (a) = 0. 

Using this result, we will prove the following assertion. 

Every polynomial of arbitrary degree with real coefficients. has at 
least one complex root. 

Indeed, suppose we have a polynomial f (x) with real coefficients 
having degree n = 2"g, where g is an odd number. Since the case 

= 0 has already been considered (see above), we shall assume 

k > 0, that is, we consider n an even number and we will argue by 
induction with respect to k, on the assumption that our assertion has 


22—5760 
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been proved for all polynomials with real coefficients whose degrees 
are divisible by 2*-! but not divisible by 2* *. 

Let P be a splitting field for the polynomial f (z) over the field 
of complex numbers (see Sec. 49), and let a, a, ..., Op be the 
roots of f (x) in P. Choose an arbitrary real number c and take the 
elements of the field P having the form 


Bu = aiaj tela, +a), i<j vee (1) 
The number of elements B,; is obviously equal to 
a a Dhtg (hq—1) = 2g’. (2) 


where q’ is an odd number. - o TEF 
© Let us now construct from the ring P [z] a polynomial g (z) 
having for its roots all the elements ĝß;; and only these elements: 


sas, Ebu) 


The coefficients of this polynomial are elementary symmetric poly- 
nòmials in ß;;. Consequently, by (1), they will be polynomials 
in G4, Œs, - - -, Œn With real coefficients (since the number c is real), 
they will even be symmetric polynomials. True enough, a transposi- 
tion of any two a, say a, and a@;, implies merely a rearrangement in 
the set of all B,;: every Bry, where j is different from k and from J, 
is converted into B,;, and conversely, whereas B,; and all B,;, for i 
and j different from k and l, remain fixed. But the coefficients of 
the polynomial g (x) remain unchanged under a rearrangement of its 
roots. 

From this it follows, by the fundamental theorem on symmetric 
polynomials, that the coefficients of the polynomial g (z) will be poly- 
nomials (with real coefficients) in the coefficients of the given poly- 
nomial f (z) and for this reason will themselves be real numbers. 
The degree of this polynomial, which is equal to the number of the 
roots B;j, is divisible, according to (2), by 2'-1 but is not divisible 
by 2". And so, by the induction hypothesis, at least one of the roots 
Bij of the polynomial g (x) must be a complex number. 

Thus, for any choice of the real number c there is a pair of 
indices, i, j} 1<i<n,1<j <n, such that the element a;a; + 
+ c (a; + aj) is a complex number (recall that the field P contains 
the field of complex numbers as a subfield). Quite naturally, for any 
other ‘choice of the number c there will, generally speaking, corre- 
spond to it (in the indicated sense) another pair of indices. However, 
there exist an infinitude of distinct real numbers c, whereas we have 
at our disposal only a finite number of distinct pairs i, 7, Whence it 


* Consequently, this degree can even be greater than n. 


55. FUNDAMENTAL THEOREM OF ALGEBRA OF COMPLEX NUMBERS 339 


follows that we can choose two distinct real numbers c, and ca, cy, £ 
Æ c,, such that they are associated with one and the same pair of 
indices i, j, for which 


aiaj + ey (%; + aj) = a, 
AiG; + ca (a; + aj) = 


are complex numbers. 
From equality (3) it follows that 


(c1 — ca) (a; + aj) =a—d 


(3) 


whence — 
a—b 
CAT es 
That is to say, this sum is a complex number. From this and at least 
from the first of the equalities (3) it follows that the product a;a, 
will also be a complex number. Thus, the elements œ; and a, are 
roots of the quadratic equation 


z? — (a, + aj) r + aja; = 0 


with complex coefficients and therefore, as follows from the formula 
(derived in Sec. 38) for solving quadratic equations with complex 
coefficients, they must themselves be complex numbers. Thus, 
among the roots of the polynomial f (x) we have even found two 
complex roots and the proof of our assertion is complete. 

For complete proof of the fundamental theorem, we have yet 
to consider the case of a polynomial with arbitrary complex coeff- 
cients. Let 


f (x) = aot + at H... +a 
be such a polynomial. Take the polynomial 
f (2) = ar" + aga! +...+4, 


obtained from f (x) by replacing all coefficients with conjugate 
complex numbers and then consider the product 


F (x) =f (x) f (£) = baz” + bis? 1 +... + bpe +... tbr 
where, evidently, 
b= >) aja;, k=0, 4, 2,...,2n 
i4j=hk 
Using the familiar properties of conjugate complex numbers (see 
Sec. 18), we find that 
bp = 5 10; = br 
ifj=k f 

That is, all coefficients of the polynomial F (x) prove to be real. 


: 22% 
29% 
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It then follows, as proved above, that the polynomial F (x) has 
at least one complex root f, 


F (B) = f (B) f (B) = 0 


That is, either f (P) = 0 or f (B) = 0. In the former case, the theorem 
is proved. But if the latter case occurs, that is, 


ap" + aB t.n.. HHan =0 


then, replacing all complex numbers here by their conjugates 
(which, as we know, does not affect the equality), we get 


f) = aob" + aB H. o H an =0 


Thus, f (x) has the complex number B for its root. This completes 
the proof of the fundamental theorem. 


CHAPTER 12 


POLYNOMIALS 
WITH RATIONAL 
COEFFICIENTS 


56. Reducibility of Polynomials over the Field 
of Rationals 


The field of rational numbers, R, is the third number field of 
particular interest .to us, along with the fields of real and complex 
numbers. It is the smallest of the number fields: as proved in Sec. 43, 
the field R is contained in its entirety in any number field. We will 
now investigate the question of the reducibility of polynomials over 
the field of rationals, in the next section we deal with the rational 
(integral and fractional) roots of polynomials with rational coeffi- 
cients. We stress once again that these are two different things: the 


polynomial 
| zt + 2x? + 1 = (x? + 1)? 


is reducible over the field of rational numbers, though it does: not eave 
a single rational root. 

What can be said about the reducibility of polynomials over the 
field R? First of all, note that if we have a polynomial f (x) whose 
coefficients are rational but are not all integral, then, reducing the 
coefficients to a common denominator and multiplying f (x) by this 
denominator (equal, say, to k), we obtain a polynomial kf (x), all 
the coefficients of which will now be integers. It is evident that the 
polynomials f (x) and kf (x) have the same roots; on the other hand, 
they will at the same time be reducible or irreducible over the 
field R. 

-© However, we are not yet entitled to confine ourselves to a consi- 
deration of polynomials with integral coefficients. Indeed, let the 
integral polynomial g (x) (i.e., a polynomial with integral coeffici- 
ents) be reducible over the field of rationals, i.e., factorable into 
lower-degree factors with rational (in the general case, fractional) 
coefficients. Does factorability of g (x) into factors with integral 
coefficients follow from this? In other words, might it not be true 
that a polynomial with integral coefficients that is reducible over 
the field of rational numbers is irreducible over the ring of integers? 
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The answers may be obtained via considerations similar to those 
carried out in Sec. 541. Let us call a polynomial f (x) with integral 
coefficients primitive if its coefficients are jointly relatively prime, 
that is, if they do not have any common divisors different from 1 and 
—41. If we have an arbitrary polynomial @ (x) with rational coeff- 
cients, it may be uniquely represented in the form of a product of 
a lowest-terms fraction. by some primitive polynomial: 


9 (2) = f (2) D 


To do this, factor out the common denominator of all coefficients of 
the polynomial ọ (z) and then also the common factors of the nume- 
rators of these coefficients; note that the degree of f (x) is the same 
as that of @ (x). The uniqueness (to within sign) of the representation 
(1) is proved as follows. Let 


9(2) =% f (r) =$ g(2) 
where g (x) is again a primitive polynomial. Then 
adf (x) = beg (x) 


Thus, ad and be are obtained by taking all the common factors 
out of the coefficients of one and the same integral polynomial, 
and therefore they can differ in sign alone. Whence it follows that 
the primitive polynomials f (x) and g (x) can likewise differ only 
in sign. 

The Gaussian lemma holds true for integral primitive polyno- 
mials. 

The product of two integral primitive polynomials is a primitive 
polynomial. 
Indeed, suppose we have the integral primitive polynomials 


f (a) = aga + aye" t+... faa t+... +a, 

g (z) = box! + bis! +... +b +... +b 
and let so 
f (2) g(x) = coz"! Joriy.. beg set b-th + + Chs 
If this product is not primitive, then there is a prime p such that 
serves as a common divisor of all coefficients co, c1, . . ., Cr+- Since 
all the coefficients of the primitive polynomial f (x) cannot be divi- 
sible by p, let the coefficient a; be the first one not divisible by p. 
Similarly, denote by b; the first coefficient of the polynomial g (z) 
not divisible by p. Multiplying f (x) and g (x) termwise and colle- 
cting terms in 2t-“@+), we obtain 
Cy4y = Ody + aibi + ai-abj+a 

Tes t Op 41b5-4 + aitobj- +. 
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The left side is divisible by p. Also, all terms on the right are cer- 
tainly divisible by p, except the first. Indeed, by the conditions impo- 
sed on the choice of i and j, all the coefficients a;_;, aj_,, ..., and 
also b;-4, bj-,, .. . are divisible by p. It then follows that the pro- 
duct a,b; is also divisible by p and therefore, due to the primality 
of the number p, p should divide at least one of the coefficients a,, 
bj, which, however, is not the case. The lemma is proved. 
Let us now answer the questions posed above. Let a polynomial 
g (x) of degree n with integral coefficients be reducible over the field 
bt rational numbers: 


g (x) = pı (2) Pa (2) 


where q, (x) and q, (z) are polynomials with rational coefficients 
and of degree less than n. Then 


p () =F h(a), i=1,2 


where 3 is in lowest terms and fi (£) is a primitive polynomial. 
Then 


g (0) =F Ih @ fa @) 


The left member is an integral polynomial and so the denominator 
b,b, in the right member must be reducible. But the polynomial in 
brackets will, by the Gaussian lemma, be primitive, and so any 
prime factor from b,b, can cancel out only with some prime factor 
from a,a,, and since a; and b; are relatively prime, i = 1, 2, the 
number a, must be exactly divisible by },, and a, by bz; 


a= bya, a, = bza; 


g (z) = aaf, (2) fa (2) 


Adjoining the coefficient a,a, to any one of the factors f; (x), fa (2), 
we obtain a factorization of the polynomial g (z) into factors of lower 
degree with integral coefficients. This is the proof of the following 
theorem. 

A polynomial with integral coefficients that is irreducible over the 
ring of integers will also be irreducible over the field of rational numbers. 

We can now restrict ourselves, in questions relating to the redu- 
cibility of polynomials over the field of rationals, to a consideration 
of factorizations of integral polynomials into factors whose coeffi- 
cients are all likewise integral. 

We know that any polynomial of degree greater than unity is 
reducible over the field of complex numbers, and any polynomial 
(with real coefficients) of degree greater than two is reducible over 
the field of real numbers. The situation regarding the field of ratio- 


Whence 
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nal numbers is quite different: for any n there is a polynomial of degree 
n with rational (even integral) coefficients that is irreducible over the 
field of rational numbers. The proof of this assertion is based on the 
following sufficient criterion of the irreducibility of a polynomial 
over the field R, called the Eisenstein criterion. 

Suppose we have the polynomial 


f(x) = aor” + a +.. ae Anat + Ap 


with integral coefficients. If there is at least one way in which we can 
choose the prime number p that satisfies the following requirements: 

(1) the leading coefficient ay is not divisible by p, 

(2) all the other coefficients are divisible by p, 

(3) the constant term is divisible by p but not by p*, 
then the polynomial f (x) is irreducible over the field of rational 
numbers. . | 

Indeed, if the polynomial f (z) is reducible over the field R, then 
it can be factored into two factors of lower degree with integral coeffi- 
cients: a 3 


f (x) = (boz? + bitt +... + ba) (cot! + crt +... +) 


where k< n, L< n, k + l = n. From this, comparing coefficients 
in both members of the equation, we obtain 


An = bkep, 

Anny = brei- + Ope, 

Ong = Ope yg + Dy-2e 1-1 + Dp-2e (2) 
A ao a igh a td er een 


_ From the first of the equalities (2) it follows that, since a, is divi- 
sible by p and p is prime, one of the factors bp, cı must be divisible 
by p. Both cannot be divisible by p at the same time since a,, by 
hypothesis, is not divisible by p*. For instance, let p divide bp; 
therefore cı is prime to p. We now go over to the second of the equali- 
ties (2). Its left member and also the first term in the right member 
are divisible by p, and so p divides the product b,-1c; aS well, but 
since p does not divide c;, p does divide b,-,. In the same fashion, we 
find from the third equality of (2) that p divides b,.,, and so on. 
Finally, from the (k + 4)th equality it will be found that p divides 
bo; but then from the last equality of (2) it follows that p divides ap, 
which contradicts our assumption. | 

It is extremely easy to write, for any n, integral polynomials 
of degree n that satisfy the conditions of the Eisenstein criterion and, 
hence, are irreducible over the field of rational numbers. Such, for 
example, is the polynomial z” + 2; the Eisenstein criterion is appli- 
cable for p = 2. 
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The Eisenstein criterion is only a sufficient condition for irredu- 
cibility over the field R, but by no means is it a necessary condition: 
if it is not possible, for a given polynomial f (z), to find a prime num- 
ber p such that the conditions of the Eisenstein criterion are fulfil- 
led, it may be reducible, like z? — 5z + 6, but it may also be irre- 
ducible, like z? + 4. There are a large number of other sufficient 
criteria besides the Eisenstein criterion (though less important) 
for irreducibility of polynomials over the field R. There is also a meth- 
od, due to Kronecker, which permits one to decide whether any 
polynomial with integral coefficients is reducible or not over R. 
However, it is very unwieldy and hardly at all applicable in a 
practical sense. 


Example. Consider the polynomial 
fp (2) p-type tet 


z—1 
where p is a prime number. The roots of this polynomial are pth roots of unity 
different from unity itself; since these roots, together with 1, divide the unit 
circle of the complex plane into p equal parts, the polynomial fp (z) is called 
a cyclotomic polynomial. l . 
The Eisenstein criterion cannot be directly applied to this polynomial. 
But by changing the variable, setting z = y + 1, we get 


e= utot =< [ wr pyr Pe yt... +p | 


= yt pyr PEO) yop oe +P 


The coefficients of the polynomial g (y) are binomial coefficients and so all, 
except the leading coefficient, are divisible by p; the constant term is not divi- 
sible by p*. Thus, by the Eisenstein criterion, the polynomial g (y) is irredu- 
cible over the field R, whence follows the irreducibility over R of the cyclotomic 
polynomial fp (z). Indeed, if 

. fp (z) = ọ (z) } (z2) 

then l 


e(y = puti gylt 4) 


57. Rational Roots of Integral Polynomials 


It was pointed out above that the question of the factorization 
of a given polynomial over the field of rational numbers into irre- 
ducible factors has no really satisfactory practical solution. Howe- 
ver, the particular case referring to the isolation of linear factors of a 
polynomial with rational coefficients, that is, to the finding of its 
rational roots, is very simple and may be solved without exces- 
sive computations. Quite naturally, the problem of finding rational 
roots of a polynomial with rational coefficients does not in the least 
exhaust the general problem of the real roots of these polynomials; 
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that is to say, the methods and results given in Chapter 9 are valid 
in toto when applied to polynomials with rational coefficients. ` 

As we take up the question of finding the rational roots of poly- 
nomials with rational coefficients, it is well to note that, as indicated 
in the preceding section, we can confine ourselves to polynomials 
with integral coefficients. We shall consider separately the case of 
integral and that of fractional roots. 

If an integer a is a root of a polynomial f (x) with integral coeffi- 
cients, then a is a divisor of the constant term of the polynomial. 

Indeed, let 


f(E) = a + aat +... H n 
Divide f (z) by z — a: l 
f (x) = (x — a) (bot? + biz™? +... + dni) 


Performing the division by the Horner method (see Sec. 22), 
we find that all coefficients of the quotient, including b,-1, are integers, 
and since 


a = —abz4 = & (—bn-1) 


our assertion is proved.* 

Thus, if an integral polynomial f (x) has integral roots, they will 
be found among the divisors of the constant term. It is thus necessary 
to test all possible divisors (both positive and negative) of the con- 
stant term. If none is a root of the polynomial, then the polynomial 
has no integral roots at all. 

To test all the divisors of the constant term may turn out to be 
extremely complicated even if the values of the polynomial have 
been computed by the Horner method and not via direct substitution 
of each of the divisors in place of the unknown. The following 
remarks somewhat simplify computations. First of all, since both 
4 and —1 are always divisors of the constant term, we compute f (4) 
and f (—1). This presents no difficulties. Now if the integer æ is a root 


of f (z), 
f (Œ) = (æ — a) (2). 


then, as indicated above, all the coefficients of the quotient q (z) 
will be integers, and therefore the quotients 


10 = —q (î), H- —qg(— 1) 


* It would be wrong to attempt to prove this theorem by referring tothe 
fact that the constant term a, is (to within sign) a product of the roots of the 
polynomial f (z): these roots can include fractional, irrational, and complex 
roots, and one cannot, therefore, assert beforehand that the product of all these 
roots (except a) will be integral. 
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must be integers. Thus, only such divisors a of the constant term (from 
among those which differ from 1 and —1) have to be tested, relative to 
which each of the quotients we n? is an integer. 
Example i. Find the integral roots of the polynomial 


f (x)= 23 — 22226 


The numbers +1, +2, 3, +6 are divisors of the constant term. Since 
f (4)= —8, f (—1)= —8, it follows that 1 and —1 are not roots. Furthermore, 
the numbers 


—8 = —8 —8 
2-4? Z220 6-1’ Z6 


are fractions and so the divisors 2, —2, 6, —6 have to be rejected, whereas 
the numbers 


=g cag rg -8 
a 8-1’ 841’ —3—1'. —341 
are integers and so the divisors 3 and —3 have yet to be tested. We apply 


the Horner method: 
1 —2 —1 —6 
—3]1 —5 14 —48 


That is, f (—3)= —48 and so —3 is not a root of f(z). Finally, | 
1 —2 —1 —6 
3}4 4 #2 =O 


That is, f(3)=0: the number 3 is.a root of f (x). At the same time we found 
the coefficients of the quotient obtained by dividing f (z) by x—3: 


f (x) = (x—3) (12 +z +2) 


It is readily seen that the quotient z? +- z + 2 does not have 3 as its root, which 
means that this number is not a multiple root of f (z). 
Example 2. Find integral roots of the polynomial 


f (z) = 324 + z’ — 52? — 2z + 2 


Here, +4 and +2 are divisors of the constant term. Furthermore f(i)= —14, 
f (—4) = 1, i.e., 1 and —4 do not serve as roots. Finally, since the numbers 
1 —1 


= "i O 


are fractions, it follows that 2 and —2 will not be roots either and so the poly- 
nomial f (z) does not have any integral roots at all. 


Let us now examine the question of fractional roots. 

If an integral polynomial whose leading coefficient is unity has a ra- 
tional root, then this root is an integer. 

Indeed, let the polynomial 


f (x) = 2 + ar! + aa? Hann th 
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with integral coefficients have for a root the fraction 2 in lowest 


terms, i.e., 


pn pn-1 pn-2 
Poy -} a, Tna + Og -na + s.s + an =0 


From this it follows that 
| a= ab" — a,b" e — .. . — anc” 
Thus the simplified fraction is equal to an integer, which is impos- 
sible. 
To obtain all the rational (fractional and integral) roots of an integral 
polynomial . 
f (x) = aoa” + aygz™ + az”? +... + anat + & 
it is necessary to find all the integral rpots of the polynomial 
P (u) = y” + ayy? + agagy”® +... + ap Pansy + afta, 
and divide them by ag. 
Multiply f (z) by a>"? and then change the unknown, putting 
y = ax. Clearly, l 
P (y) = p (ax) = apf (2) 
whence it follows that the roots of the polynomial f (z) are equal to 
the roots of the polynomial ọ (y) divided by ao. In particular, to 
rational roots of f (x) there will correspond rational roots of ọ (y); 
however, since the leading coefficient of ọ (y) is equal to unity, these 
roots can only be integral, and we already have a procedure for 
finding them. | 
Example. Find rational roots of the polynomial 
Í (4) = 824 + 523-224 52 — 2 
Multiplying f(z) by 3% and setting y=3z, we get 
p(y) =y* + 5y8 + By? + 45y — 54 


We seek integral roots of the polynomial 9 (y). 
Let us find ọ (1) by the Horner method: 


153 45 —54 
1,1 6 9 54 0 


Thus, p (1)=0, that is, 4 is a root of (0) (y); and 
— 8W=W—1)ay) 
a(y=v-+6y24+ 0454 2 


Let us find the integral roots of the polynomial q (y). The siuaibers +1, 
+2, +3, +6, +9, +18, +27, +54 are divisors of the constant term. Here, 


q(1)=70, q (—1)=50 


where 
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Computing i). and si) for every divisor œ we find that all divisors, 
except «= — 6, must be rejected. Test this divisor: 
169 54 
—6|1 09 0 


Thus, ¢ (—6) = 0, or —6 is a root of g (y) and therefore also of ọ (y). 
Consequently, the polynomial g (y) has integral roots 1 and —6. Thus 


the numbers EJ and —2, and only these numbers, are rational roots of the poly- 
nomial f (z). 


It must be stressed once again that the above-described methods 
are applicable only to polynomials with integral coefficients and 
only for finding their rational roots. . 


58. Algebraic Numbers 


Every polynomial of degree n with rational coefficients has n roots 
in the field of complex numbers; some of these roots (or even all of 
them) can lie outside the field of rational numbers. However, not 
every complex or real number serves as a root of some polynomial 
with rational coefficients. The complex (or, in particular, real) 
numbers which are roots of such polynomials are called algebraic 
numbers in contrast to transcendental numbers. Algebraic numbers 
include all rational numbers (as the roots of first-degree polynomials 


with rational coefficients) and also any radical of the form Ve a 
with rational radicand a (as a root of the binomial z” — a). On the 
other hand, the more comprehensive courses of mathematical analy- 
sis offer proof of the transcendence of the number e (the base of the 
system of natural logarithms) and also of the familiar number x of 
elementary geometry. 

If a number a is algebraic, then it will even be a root of some poly- 
nomial with integral coefficients and therefore a root of one of the 
irreducible divisors of this polynomial, also with integral coefficients. 
The irreducible integral polynomial, of which œ is a root, is determined 
uniquely to within a constant factor, that is to say, quite uniquely if 
we require that the coefficients of the polynomial be relatively prime 
jointly (i.e., that the polynomial be primitive). Indeed, if a serves as 
a root of two irreducible polynomials f (x) and g (x), then the greatest 
common divisor of these polynomials will be different from unity, 
and therefore the polynomials, due to their irreducibility, can differ 
from one another by a zero-degree factor only. 

Algebraic numbers which are roots of one and the same irreducible 
(over the field R) polynomial are termed conjugate.* Thus, the whole 


i * Not to be confused with the concept of the conjugacy of complex num- 
ers, 
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set of algebraic numbers breaks up into disjoint finite classes of con- 
jugate numbers. No rational number, as a root of a first-degree poly- 
nomial, has conjugate numbers different from itself; this property 
is characteristic of rational numbers: every algebraic number which 
is not rational is a root of an irreducible polynomial of degree greater 
than unity, and for this reason it has conjugate numbers different 
from itself. 

The set of all algebraic numbers is a subfield of the field of complex 
numbers. In other words, the sum, difference, product and quotient of 
algebraic numbers are algebraic numbers. 

In fact, suppose we have the algebraic numbers œ and f. Denote 
by ai =Q, Qa, ..., On all numbers conjugate to a, by Bi = B, Bo, ... 

. Ps all numbers conjugate to B, by f (x) and g (x), irreducible 
polynomials with rational coefficients having for roots a and 6 re- 
spectively. Write a polynomial whose roots are all possible sums 


e()= I I e—a) 


It is obvious that the coefficients of this polynomial will not change 
under rearrangements of all œ; and also of all fy. Hence, on the 
basis of the theorem on polynomials symmetric with respect to two 
systems of unknowns (see end of Sec. 53), they are polynomials in 
the coefficients of the polynomials f (z) and g (z). In other words, 
the coefficients of the polynomial ọ (zx) prove to be rational numbers, 
and therefore the number a + P = a, + Pı, which is one of its 
roots, will be algebraic. 

The algebraic nature of the numbers a — f and af is proved 
in similar fashion with the aid of the polynomials 


v= i D e—a 


and 
x)= I I eeg) 


To prove the algebraic nature of a quotient, it suffices to demon- 
strate that if a number & is algebraic and different from zero, then 
a7! will also be an algebraic number. Let œ be a root of the poly- 
nomial 


f (£) = a + ayt 4+... H Anat +a, 
with rational coefficients. Then, evidently, the polynomial 


g(x) = ant” + an- +... + ax + ao 
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which also has rational coefficients, will have for a root the number 
a-!, which is what we set out to prove. 

It follows, from the theorem just proved, that any sum of a ratio- 
nal number and a radical, say 4 + See 2, and also any sum of radi- 
cals, say V 3 + Vi 5, will be algebraic numbers. However, we 
cannot as yet assert that numbers written as radicals within radicals, 


say yi +V 2, are algebraic. This will be a consequence of the 


following theorem. 
If the number œ is a root of the polynomial 


p (z) = 2 + arl + Be H... tae tp 


whose coefficients are algebraic numbers, then œ is also an algebraic 
number. 

Let a@;, By, --. Ag, pe run through numbers which are respec- 
tively conjugate to a, B, ..., A, p, it being true that a, = q, 
1= f, ..., A, =A, pws = pw. Consider all possible polynomials 


of the form 
Pi, j, 2.58, t (2) = 2” tape! +B jet nn H Age py 


so that @i,1,...,1,1 (z) = ọ (z) and take the product of all these 
polynomials: 
P= TL Pine) 


2 2h oy i] 


The coefficients of the polynomial F (zx) are obviously symmetric 
with respect to each of the systems a,, Bj, ..., As, pe and there- 
fore (again by the theorem of Sec. 53) are polynomials in the coeffi- 
cients of those irreducible polynomials (with rational coefficients) 
whose roots are, respectively, a, B, ..., A, p; that is to say, they 
are themselves rational numbers. The number œ, being a root of 
@ (x), will, consequently, be a root of the polynomial F (x) with 
rational coefficients, i.e., it will be an algebraic number. 


Let us apply this theorem to the number o = V1 +Y2. The 


number a = 1+) 2 is algebraic by the previous theorem and 
therefore the number œ is a root of the polynomial z?—a with 
algebraic coefficients; that is, it is itself algebraic. Generally, 
applying several times both theorems that have just been proved, 
the reader will easily arrive at the following result. 

Any number written in radicals over the field of rational numbers 
(that is to say, a number expressed in terms of some arbitrarily compli- 
cated combination of radicals—radicals within radicals, in the general 
case) is an algebraic number. 
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Obviously, algebraic numbers written as radicals constitute 
a field. One must bear in mind, however, that this field, as follows 
from the remark made (without proof) at the end of Sec. 38, will 
only be a part of the field of all algebraic numbers. 

We have already mentioned the transcendence of two numbers: 
e and x. Actually, however, there are an infinity of transcendental 
numbers. What is. more, using the concepts and methods of set 
theory, we will show that there are, so to say, even more transcen- 
dental numbers than algebraic numbers. The exact meaning of this 
sentence will be clear from what follows. . 

An infinite set M is called countable (denumerable), if it can 
be put into one-to-one correspondence with the set of natural num- 
bers, that is to say, if its elements can be enumerated with the aid 
of the natural numbers, otherwise it is noncountable. 

Lemma 1. Every infinite set M contains a countable subset. 

Indeed, take an arbitrary element a, in M and then an element 
a, different from a,. Generally, let there be chosen n distinct elements 
i, z}, ..., @, in M. Since the set M is infinite, it cannot be exhau- 
sted by these elements, and so we can find an element a, 4, different 
from them. Continuing this process, we will find in M an infinite 
subset composed of the elements 


Qs; ay, oe -» Uns “ee 


The countability of this subset is obvious. . 
Lemma 2. Every infinite subset B of a countable set A is itself 
countable. 
Because of its countability, the set A can be written as 


a4; ae, E 3 an, (1) 


Let ap, be the first element of the sequence (1) belonging to B, ar, 
the second element with this same property, etc. Assuming akn = 
= ba, n= 1, 2,..., we find that the elements of the subset B 
constitute a sequence 


bi, RET bns es 


It is clear that this subset is countable. 

Lemma 3. The union of a countable set of finite sets which pairwise 
do not have any common elements is a countable set. 

Indeed, suppose we have the finite sets 


Ái, Ags . < °) Aans ee o 


Let their union be B. We will obviously enumerate all elements of 
the set B if, in arbitrary fashion, we number the elements of the 
finite set 44, then continue the numbering by passing to the elements 
of the set A,, and so on. 
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Lemma 4. The union of two countable sets which are devoid of com- 
mon elements is a countable set. 
Let there be given a countable set A with elements 


. Oy gh we wg prend 
and a countable set B with elements 
bis Oss eee ag Ons w sa 
and let the union of these sets be C. If we put 
| On = Canny, bn = Cans Noes 2,... 
then all elements of C will be represented as the sequence 
Oi) lad cack gC AE Cane Os 


This completes the proof of the countability of this set. 

Now let us prove the following theorem. 

The set of all algebraic numbers is countable. 

First let us prove the countability of the set of all polynomials in 
one unknown with integral coefficients. If 


f (x) = aor” + ayt +... + anet + an 


is such a polynomial (different from zero), let us use the term keight 
of the polynomial for the natural number 


h,=n+ [olt ialt... + [ant | + la | 


It is obvious that there is only a finite number of integral polyno- 
mials with a given height k; denote this set by My. Denote the set 
consisting of zero alone by My. The set of all integral polynomials 
will be the union of the countable set of the finite sets Mo, Mi, 
M, ..., Mn, ...; that is to say, by Lemma 3, it is countable. 

From this, by Lemma 2, it follows that the set of all integral 
primitive irreducible polynomials is also countable. At the same time, 
we know that every algebraic number is a root of one and only one 
integral primitive irreducible polynomial. Consequently, collect- 
ing the roots of all such polynomials, that is, taking the union 
of the countable set of finite sets, we obtain the set of all algebraic 
numbers. This set will thus, by Lemma 3, be countable. 

Finally, let us prove the following theorem. 

The set of all transcendental numbers is noncountable. 

Let us first consider the set F of all real numbers x between 
zero and unity, 0 < z < 1, and let us prove that this set is noncoun- 
table. We know that each of the indicated numbers z may be written 
as a regular infinite decimal fraction 


x = 0, Ag... Gp... 


23—5760 
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and that this notation is unique if we do not allow for fractions 
in which for all n beyond some n = N all a, = 9; conversely, any 
fraction of this form is equal to some number z from the set F. Now 
suppose that the set F is countable, that is, that all the numbers x 
can be written as the sequence 


Bis esa. ay Thy Te a (2) 
Let 
Tk = 0, Akiko «++ Ann... 
be the notation of the number zp in the form of an infinite decimal. 
Now write the infinite. decimal fraction 


O, BaBa -o Boose (3) 


assuming f, to be different from the first decimal digit of the frac- 
tion z4, that is, B, 54 @41, Pa to be different from the second decimal 
digit of the fraction z, i.e., By a@,, and, generally, Bn 3E Qnn- 
Besides, assume that among the digits P, there are infinitely many 
that are different from the digit 9. It is clear that there is a frac- 
tion (3) which satisfies all these requirements. It is thus a number 
in the set F, but it is different, by its construction, from all the 
numbers of the sequence (2). This contradiction proves the nonco- 
untability of the set F. 

Whence follows the noncountability of the set of all complex num- 
bers: if the set were countable, then, by Lemma 2, it could not con- 
tain the noncountable subset F. The noncountability of the set 
of all transcendental numbers is now, by Lemma 4, obvious, since 
the union of this set with the countable set of all algebraic numbers 
is the set of all complex numbers, that is to say, it is noncountable. 

The two theorems we have proved show us, due to Lemma 1, 
that the set of the transcendental numbers is indeed much richer 
in elements (that is to say, more “potent”) than the set of algebraic 
numbers. 


CHAPTER 13 


NORMAL FORM 
OF A MATRIX 


59. Equivalence of -Matrices 


We return now to problems of linear algebra. Chapter 7 demon- 
strated the important role of the concept of similarity of matrices. 
Namely, two square matrices of order n are similar if and only if 
they represent (in different bases) the same linear transformation 
of n-dimensional linear space. However, we are not yet able to tell 
whether two given specific matrices are similar or not. On the other 
hand, among all matrices similar to a given matrix A, we are not 
able to indicate a matrix of elementary form (in one sense or another); 
even the question of the conditions under which a matrix A is simi- 
lar to a diagonal matrix was considered in Sec. 33 only for a parti- 
cular case. These are the questions we will take up in this chapter. 
(Note that they are discussed straight off for the case of an arbitrary 
base field P.) 

Let us first investigate square matrices of order n whose elements 
are polynomials of arbitrary degree in a single unknown A with 
coefficients from the field P. These are called polynomial matrices 
or, briefly, A-matrices. An example of a A-matrix is the characteristic 
matrix A — AE of an arbitrary square matrix A with elements in P. 
The principal diagonal of this matrix contains first-degree polyno- 
mials, all off-diagonal elements are zero-degree polynomials or 
zeros. Every matrix with elements from the field P (for brevity, we 
call them numerical matrices) is also a special case of a A-matrix: 
its elements are polynomials of degree zero or zeros. 

Suppose we have a A-matrix 


as (A) «~~ ain (A) 
A (A) = ae E oN 
ani (À)... ann (A) 
We use the term elementary transformations of this matrix for the 


following four types of transformation: 


23* 
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(1) multiplication of any row of.the matrix A (A) by any scalar a 
in P different from zero; 

(2) multiplication of any column of A (A) by any scalar œ in P 
different from zero; 

(3) addition, to any ith row of matrix A (A), of any jth row of it, 
j= i, multiplied by any polynomial ọ (A) in the ring P [A]; 

(4) addition, to any ith column of matrix A (A), of any jth column 
of it, j i, multiplied by any polynomial ọ (A) in the ring P [A]. 

It is readily seen that for every elementary transformation of the 
A-matrix there is an inverse transformation which is also elementary. 
Thus, the inverse of (4) is an elementary transformation consisting 
in the multiplication of that row by the number a~!, which exists 
due to the condition œ #0; the inverse of (3) is a transformation 
which consists in adding to the ith row the jth row multiplied by 
—@ (A). 

It is possible to interchange any two rows or any two columns in 
a matrix A (A) by a number of elementary transformations. 

Suppose we wish to interchange the ith and jth rows of A (A). 
This can be accomplished by means of four elementary transformations 
as the scheme below illustrates: 


GPU Pan en) 


The sequence of transformations is: (a) add jth row to ith row; (b) sub- 
tract the new ith row from the jth row; (c) add the new jth row to the 
new ith row; (d) multiply the new jth row by —1. 

We will say that the A-matrices A (A) and B (A) are equivalent 
and we will write A (A) ~ B (A) if the matrix A (A) can be carried 
into the matrix B (A) by means of a finite number of elementary 
transformations. This equivalence relation is obviously reflexive 
and transitive and also symmetric, due to the existence of an inverse 
elementary transformation for every elementary transformation. 
In other words, all square -matrices of order n over the field P break 
up into disjoint classes of equivalent matrices. 

Our immediate aim will be to find the simplest kind of matrices 
among all the A-matrices equivalent to the given matrix A (A). 
To do this, we introduce the following concept. A canonical -matrix 
is a A-matrix with the following three properties: 

(a) the matrix is diagonal, that is, of the form 


e, (A) 6 
e (À) 
; (1) 


G t en (h) 
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(b) any: polynomial e; (A); i = = 2, 3,...., n, is een: divisible 
by the polynomial e;_, (A); 

(c) the leading coefficient of every polynomial e; (A), i = 4, 2, 
..., R, is equal to unity if the polynomial is nonzero. 

Note that if among the polynomials e; (A) on the principal diago- 
nal of the canonical :A~matrix (1) there are some equal to zero, then, 
by property (b), they invariably occupy the last positions on the 
principal diagonal. On the other hand, if there are zero-degree poly- 
nomials among the polynomials e; (A), then, by Property (c), they 
are all equal to unity, and, by Property (b), they occupy the first 
positions on the principal diagonal of the matrix (1). 

The canonical A-matrices embrace, among others, the numerical 
matrices, including the unit and zero matrices. . 

Any -matrix is equivalent to some canonical 4-matriz, that is to say, 
it can be reduced to canonical form via elementary transformations. 

We will prove this theorem by induction with respect to the 
order n of the -matrices at hand. Indeed, for n = 1 we have 


A (A) = (a (A) 


If a (A) = 0, then our matrix is already canonical. But if a (A) = 0, 
then it suffices to divide the polynomial a (A) by its leading coef- 
ficient (this is an elementary matrix transformation) in order to get 
a canonical matrix. 

Suppose the theorem has been proved for A-matrices of 
order n— 1. We consider an arbitrary A-matrix A (A) of order n. 
If it is a zero matrix, it is already canonical and no proof is needed. 
We therefore take it that there are nonzero elements among the 
elements of matrix A (A). 

Interchanging (if necessary) rows and columns of A (A), we can 
move one of the nonzero elements into the upper left-hand corner. 
Thus, of the A-matrices equivalent to A (A), there are some with 
a nonzero polynomial in the upper left corner. Let us consider all 
such matrices. The polynomials in the upper left corner of these 
matrices may have different degrees. But the degree of a polynomial 
is a natural number, and in any nonempty set of natural numbers 
there is a least number. It is thus possible to find, from among all 
the A-matrices equivalent to A (A) and having a nonzero element 
in the upper left corner, one matrix such that the polynomial in the 
upper left corner is of the lowest possible degree. Finally, dividing 
the first row of this matrix by the leading coefficient of the indicated 
polynomial, we get a A-matrix ae. to A (A), 


ĉi (A) bis (A)... bin (A) 


A (A) ~ Bos a) ban Mee | sa (A) 


bni @) He 0). bnn (A) 
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such that e, (A) = 0, the leading coefficient of this polynomial is 
equal to unity, and no combination of elementary transformations 
can carry the resulting matrix into a matrix in which the upper left- 
hand corner would be occupied by a nonzero polynomial of lower 
degree. 

We now prove that all elements of the first row and first column of 
the matrix obtained are exactly divisible by e, (A). Suppose, for example, 


for 2<j<n, 
by; (A) = e, (A) g (A) +r (A) 


where the degree of r (A) is less than the degree of e, (A) if r (A) is 
different from zero. Then, subtracting from the jth column of our 
matrix the first column multiplied by q (A) and interchanging the 
first and jth columns, we obtain a matrix equivalent to A (A) in the 
upper left corner of which is the polynomial r (A), that is to say, 
a polynomial of lower degree than e, (A), which contradicts the 
choice of this polynomial, whence it follows that r (A) = 0. The 
proof is complete. 

Now subtracting from the jth column of our matrix the first 
column multiplied by q (A), we replace the element b,; (A) by zero. 


Performing such transformations for j = 2, 3, ..., n, we sub- 
stitute zeros for all elements 5,; (A). In similar fashion we substitute 
zeros for all elements b;; (A), i = 2, 3, ..., n. We thus arrive at 


a matrix, equivalent to A (A), in the upper left corner of which is the 
polynomial e, (A), all other elements of the first row and the first column 


being zero: 
ey (A) 0 aoe 0 


O Can (A) «+» Can (A) 


O eng (À). :° Cnn (A) 

By the induction hypothesis, the matrix of order n — 1 in the 
lower right corner of the matrix (2) that we have obtained can be 
reduced to canonical form by elementary transformations: 


te (A)... o) (> : ) 
Vn A) 2 em OO O nO 


Having performed these same transformations on the corresponding 
rows and columns of matrix .(2) (in the process, the first row and 
first column will obviously remain unchanged), we find that 
e, (A) 0 

e, (A) 


A (A) ~ (2) 


A (A) ~ (3) 


0 66 (A) 
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To prove that the matrix (3) is canonical, it remains to demon- 
- strate that e, (A) is exactly divisible by e, (A). Suppose 


e (A) = e, (A) g (A) +r (A) 


where r (A) 0 and the degree of r (A) is less than that of e (A). 
However, by adding to the second column of (3) the first column 
multiplied by g (A) and then subtracting the first row from the 
second, we replace the element e, (A) by the element r (A). Then, 
by interchanging the first two rows and the first two columns, we 
transfer the polynomial r (A) to the upper left corner of the matrix, 
but this contradicts the choice of the polynomial e, (A). 

The theorem on the reduction of a A-matrix to canonical form 
is proved. We have to supplement it with the following uniqueness 
theorem. 

Every -matriz is equivalent to one canonical matrix only. 

Suppose we have an arbitrary -matrix A (A) of order n. Take 
some natural number k, 1<k<n, and consider all kth-order 
minors of A (A). Computing these minors, we obtain a finite system 
of polynomials in A; we denote the greatest common divisor of this 
system of polynomials with leading coefficient 1 by dp (A). 

We thus have the polynomials 


dy (A), da (A), ~~ + dn (A) (4) 


which are uniquely defined by the matrix A (A) itself. Here, d, (A) 
is the greatest common divisor of all elements of A (A) with coef- 
ficient 1, and d, (A) is equal to the determinant of the matrix A (A) 
divided by its leading coefficient. Also note that if the matrix A (A) 
has rank r, then 


dr (A) =... = d, (4) = 0 


whereas all the remaining polynomials of system (4) are different 
from zero. 

The greatest common divisor d, (A) of all minors of order k of the 
A-matriz A (A), k = 1, 2,..., n, remains unchanged under ele- 
mentary transformations of A (A). 

This assertion is almost obvious when an elementary transfor- 
mation of type (4) or (2) is performed in matrix A (A). For instance, 
if the ith row of the matrix is multiplied by a number a in the field P, 
a Æ 0, then the Ath-order minors through which the ith row passes 
will be multiples of a, whereas all the other Ath-order minors will 
remain unchanged. But when seeking the greatest common divisor 
of several polynomials, any one of the polynomials can be multiplied 
with impunity by nonzero numbers from P. 

Let us now consider elementary transformations of type (3) 
or (4). Let us, say, add to the ith row of A (A) the jth row, j = i, 
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multiplied by.the polynomial (A); denote the resulting matrix 
by A (A) and denote by d, (A) the greatest common divisor of all 
its kth-order minors taken with leading coefficient 1. Let us see 
what happens to the kth-order minors of A (A) under this transfor- 
mation. 

It is clear that minors through which the ith row does not pass 
remain unchanged. Likewise, there is no change in those minors 
through which both the ith and jth rows pass, since a determinant 
is unaltered by adding a multiple of one row to another row. Finally, 
let us take any Ath-order minor with the ith row passing through it, 
but not the jth row; denote it by M. The corresponding minor of the 
matrix A (A) can evidently be represented by the sum of the minor M 
and the minor M’, multiplied by ọ (A), of the matrix A (A), which M’ 
is obtained from M by replacing the elements of the ith row af A (A) 
by the corresponding elements of its jth row. Since both M and M’ 
are divisible by d, (A), it follows that M + ọ (A) M’ will also be 
divisible by dy (A). . 

From the foregoing it follows that all the &Ath-order minors of 
matrix A (A) are exactly divisible by d, (A) and therefore d, (A) 
too is divisible by d, (A). But since the elementary transformation 
at hand has an inverse of the same type, it follows that d, (A) is 


likewise divisible by d, (A). But if one takes into account that the 
leading | coefficients of both these polynomials are oe to unity, 
then d, (A)=4d, (A), which completes the proof. 

Thus, all A-matrices equivalent to the matrix A (A) are associated 
with one and the same set of polynomials (4). Specifically, this refers 
to any one (if there are several) canonical matrix equivalent to A (A). 
Let (3) be such a matrix. 

Let us compute the polynomial dr (A); k=1, 2,..., n, using 
matrix (3). Clearly, the kth-order minor in aie upper ‘left corner of 
this matrix is equal to the product z 


e,:(A) ea (A) -ou en (A) O O 
Furthermore, if we take, in matrix (3), the kth-order minor in the 
rows with indices ij, iz ..., in, where ij < iz... < ip and in 


columns with the same indices, then this minor is equal to the product 
Ci, (A) ei, (A) -ei (A) which is divisible by (5). Indeed, 1 < i, 
and SO ei, O) is divisible by e, (A), 2 < i, and therefore | 7A (A) is 
divisible by ez (A), and so on. Finally, if in matrix (3) we take the 
kth-order minor, through which the ith row of this matrix passes for 
at least one i but does not pass its ith column, then this minor con- 
tains a zero row and is therefore equal to zero. 

It follows from the foregoing that the product (5) will be the 
greatest common divisor of all Ath-order minors of matrix (3) and, 
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therefore, of the original matrix A (A), | l 
dy (A) = e, (A) ea (A) ae. en A) k=4,2,..„ 2 (®) 


It is now easy to show that the polynomials ep (A), k = 1, 2, 
..., n, are uniquely determined by the matrix A (A) itself. Let the 
rank of this matrix be r. Then, as we know, d, (A) = 0, but d,4, (A) = 
= 0, and therefore, by (6), e+; (A) = 0. Whence, because of the 
properties of a canonical matrix, it follows generally that if the 
rank r of matrix A (A) is less than n, then 


erpa (A) = erte (A) =... =e (A) = 0 (7) 
On the other hand, for k <r, it follows from (6), because d, -1A =40, 
that 


dy, (À 
a (= | = 


This completes the proof of the uniqueness of the canonical form 
of the A-matrix. At the same time we have obtained a direct proce- 
dure for finding polynomials ep (A), which are called invariant factors 
of the matrix A (). 


Example. : Reduce the a-matrix : gare 
A 2A2) 
an= (ALA 3A ) 


to canonical form. Performing a series of elementary transformations, we get 
iya RE a”) (3 asa N X 
aa A2+5A A een ON 


= (3 aaa : B moo. ‘ ( 0 
0 rd. -0 A 0 A3—10A2— 3A 


.On the other hand, it might be possible to compute the invariant factors 
of the matrix A (A) directly. Namely, computing, the greatest common divisor 
of the elements of this matrix, we obtain 


dj (A) = a (A)=A 
Now, computing the determinant of A (A) and noting that its leading coefficient 
is equal to 1, we obtain 
dy (A) = A4 — 1003 — 322 
and. so 
dy (A) _ 


OE ae 3—10? — 3A 
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60. Unimodular }-matrices. Relationship Between Similarity 
of Numerical Matrices and the Equivalence 
of Their Characteristic Matrices 


From the results of the preceding section there follows a criterion 
of equivalence of A-matrices, which may be stated in either of two 
almost identical formulations. 

Two h-matrices are equivalent if and only if thej can be reduced to one 
and the same canonical form. 

Two \-matrices are equivalent if and only if they have the same inva- 
riant factors. 

Let us derive another criterion of a different nature. 

We know that the unit matrix Æ is a canonical A-matrix. We 
call the A-matrix U (A) unimodular if it has the matrix E for its 
canonical form; that is to say, if all its invariant factors are equal 
to unity. 

The }-matriz U (h) is unimodular if and only if its determinant is 
nonzero but does not depend on A; that is, it is a nonzero number of the 
base field P. 

Indeed, if U (A) ~ E, then these two matrices are associated 
with one and the same polynomial d, (A). However, d, (A) = 1 
for the unit matrix. From this it foltows that the determinant of the 
matrix U (A), which determinant differs from d, (à) only by a non- 
zero numerical factor, will be a nonzero number of the field P. 
Conversely, if the determinant of the matrix U (A) is different from 
zero and is not dependent on A, then for this matrix the polynomial 
d,, (À) will be equal to unity and therefore, by (6) of Sec. 59, all 
invariant factors e; (A) of U (A), i = 4, 2, ..., n, are equal to unity. 

This implies that any nonsingular numerical matriz is a unimodu- 
lar }-matrix. However, a unimodular A-matrix can be very compli- 
cated. Thus, the A-matrix 


( À a5 +5 ) 
A2 — A — 40 A — 40? + 5A — 5 


is unimodular, since its determinant is equal to 20; that is to say, 
it is different from zero and is not dependent on i. 

From the theorem proved above it follows that a product of uni- 
modular -matrices is unimodular: it suffices to recall that in matrix 
multiplication the determinants are multiplied together. 

The -matrix U (à) is unimodular if and only if there is an inverse 
matrix which is also a h-matriz. . 

Indeed, if we have a nonsingular A-matrix, then in seeking the 
inverse matrix in ordinary fashion we will have to divide the cofac- 
tors of the elements of the given matrix by the determinant of the 
matrix, i.e., by some polynomial in 4. Therefore, in the general 
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case, the elements of the inverse matrix will be rational fractions 
in A and not polynomials in A; that is, this matrix is not a A-matrix. 
But if a unimodular matrix is given, then we will have to divide 
the cofactors only by a nonzero number from the field P; i.e., the 
elements of the inverse matrix will be polynomials in à and therefore 
the inverse matrix will itself be a A-matrix. Conversely, if the 
A-matrix U (A) has an inverse -matrix U~! (A), then the determi- 
nants of both matrices are polynomials in A, their product is equal 
to 1, and therefore both determinants must be zero-degree polyno- 
mials. 

There follows from this last remark a supplement to the theorem 
just proved: A A-matriz inverse to a unimodular i\-matriz is unimodular. 

The concept of a unimodular matrix is used in the statement 
of the following new equivalence criterion of A-matrices: Two A-matri- 
ces A (à) and B (A) of order n are equivalent if and only if there exist 
unimodular -matrices U (A) and V (A) of the same order n such that 


B (A). = U (d) A (A) V (A) (4) 


First, we introduce the following concept used in the proof of 
this criterion. We use the term elementary matrix to denote a numeri- 
cal (and, hence, A-) matrix 


1 0 
OE. AE (i) | (2) 


0 . 4 
that differs from the unit matrix in only one way: there is an arbi- 
trary nonzero number o from the field P in some ith position of the 


principal diagonal, 4 < i < n. On the other hand, we will use the 
term elementary matrix for the \-matrix 


G) 
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which differs from the unit matrix.in only one way: an arbitrary 
polynomial g (A) from the ring P [A] occupies the position at the 
intersection of the ith row and the jth column, 1 <i <n, 1 <j< 
<n, isi. 

_ Every elementary matrix is unimodular. This is gaite obvious 
since the determinant of matrix (2) is equal to a, but, by hypothesis, 
a = 0; however, the determinant of the matrix (3) is equal to 14. 

Performance of any elementary transformation in the -matrix A (À) 
is equivalent to multiplying this matrix on the left or on the right by 
some elementary matriz. 

It will be easy for the reader to verify the truth of the following 
four assertions: (1) multiplication - of the matrix A (A) on the left 
by the matrix (2) is equivalent to multiplication of the ith row 
of A (A) by the scalar a; (2) multiplication of A (A) on the right by 
matrix (2) is equivalent to multiplication of the ith column of the 
matrix A (A) by. the scalar a; (3) multiplication of matrix A (A) 
on the left by matrix (3) is equivalent to adding to the ith rowof A (i) 
itsjth row multiplied by ọ (A); (4) multiplication of the matrix A (A) 
on the right by matrix (3) is equivalent to adding to the jth column 
of A (A) its ith column multiplied by ọ (A). 

Let us now take up the proof of our criterion of the equivalence 
of A-matrices. If A (A) ~ B (A), then we can proceed from A (A) 
to B (A) by means of a finite number of elementary transformations. 
Replacing each of these transformations by multiplication on the 
left or on the right by an elementary matrix, we arrive at the equation 


B (A) = Uy (A)... Un (A) A (A) Va (A)... Vi A) (4) 


where all the matrices U, (A), ..., Uz (A), Vi (A), ... V, (A) are 
elementary and, hence, unimodular. Hence, the matrices 


U (A) = Uy (A)... Uns), VA) = VA)... Vil) (5) 


which are products of unimodular matrices will also be unimodular,. 
and equation (4) will be rewritten as (4). Notice that if, say, 
k = 0, i.e., elementary transformations are performed on columns 
only, then we simply put U (A) = E. 

. This portion of the proof already allows us to make the follo- 
wing assertion. 

A i-matriz is unimodular if and only if it is representable as a pro- 
duct of elementary matrices. 

True enough, for we have already taken advantage of the fact 
that a product of elementary matrices is unimodular. Conversely, 
if we have an arbitrary unimodular matrix W (A) then it is equiva- 
lent to the unit matrix E. Applying the foregoing proof to matrices E 
and W (A) instead of A (A) and B (A), we get from (4) the equation 


W (A) =U... Un (A) Va (A)... V2 (A) 
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which is to say that the matrix W (i) is represented as a product 
of elementary matrices. 

It is now easy to prove the converse assertion of our criterion. 
Suppose that for the matrices A (A) and B (A) there are unimodular 
matrices U (A) and V (A) such that (1) holds. From what has been 
proved, the matrices U (A) and V (A) may be represented as products 
of elementary matrices; let these be the representations (5). Then (1) 
can be rewritten as (4) and, substituting the corresponding elemen- 
tary transformation for each multiplication by an elementary matrix, 
we finally obtain A (A) ~ B (A). 

Matrix polynomials. We can take an entirely different view of the 
d-matrix concept and use the term matrix A-polynomial of order n over 
the field P for a polynomial in A whose coefficients are square matrices 
of the same order n with elements from the field P. Its general aspect is 


AA? 4 AAI +. + Arad + Ap (6) 
Regarding (in. accordance with Sec. 15) the multiplication of 
matrix A; by 4#, i = 0, 4, k, as the multiplication by 4*-* 


of all elements of the matrix Ai “and then performing matrix addi- 
tion in accord with that same Sec. 15, we find that any matrix 4-poly- 
nomial of order n may be written as a h-matriz of order n. Thus, 


( a + (5 DEEE a) a+ (o a 
—1 4 O 1 0 —2 0 0 
AMS + A —3A? + 24+ 1 
= (ey A3 + A? — 20 
Conversely, any A-matrix of order n may be written in the form of 
a matrix i-polynomial of order n. Thus, 


342-5 A+ oN, ( 0) af), (3 1 
(nee “allo Too +(, J +( as) 


The correspondence between A-matrices and matrix \-polynomials 
is one-to-one and isomorphic in the meaning of Sec. 46. Indeed, the 
equality of A-polynomials of the form (6) as matrices is equivalent 
to the equality of matrix coefficients of identical powers of A, and the 
multiplication of a matrix by A is equivalent to its multiplication 
by a scalar matrix with A on the principal diagonal. 

Suppose we have a A-matrix A (A), and 


A (A) = ApA® + A abet +... HAr + Ar 


where the matrix Ay is not a zero matrix. We call the number k 
the degree of the -matrix A (A); clearly, this is the highest power 
(in 4) of the elements of the matrix A (A). 

The view taken of A-matrices as matrix polynomials permits 
developing for A-matrices a theory of divisibility similar to the 
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theory of divisibility for numerical polynomials, made more com- 
plicated, true, by the noncommutativity of matrix multiplication 
and the presence of divisors of zero. We restrict ourselves to the sole 
problem of the division algorithm (with remainder). 

Given, over the field P, the nth-order -matrices 


A (A) = Agi + AAt- + eee + Apyn + Ak, 
B (A) —= Bw + Bvt + eee + BA + B, 


Assume that the matrix By is nonsingular, i.e., there exists a matrix 
B;*. Then, over the field P it is possible to find -matrices Q, (A) and 
R, (A) of the same order n such that 


A (A) = B (4) Qi (A) + Ri (A) © 
The degree of R, (A) is less than the degree of B (A) or R, (A) = 0. 


On the other hand, there are, over P, -matrices Q, (A) and R, (A) 
of order n such that 


A (4) = Qa (A) B (A) + Ra (A) (8) 


The degree of R, (A) is less than the degree of B (à) or R, (A) = 0. The 
matrices Qı (à) and R, (à) and also Q, (A) and R, (A) which satisfy 
these conditions are uniquely determined. 

The proof of this theorem follows the same lines as that of the 
corresponding theorem for numerical polynomials (see Sec. 20). For 
instance, let condition (7) be satisfied also by the matrices Q, (A) 


and R, (A) and the degree of R, (A) is less than the degree of B (A). 
Then 


B (A) [Qs (4) — Qs (A)] = By (A) — R (A) 


The degree of the right side is less than J, but the degree of the left 
side (if the square bracket is nonzero) is greater than or equal to Z, 
since the matrix By is nonsingular. Whence follows the uniqueness 
of the matrices Q, (A) and Ry, (A). 

To prove the existence of such matrices, notice that for k >l 
the degree of the difference 


A (A) — B (4) -B7 A 


will be strictly less than k; therefore B,1A,aA*-" will be the highest- 
degree term of the matrix A-polynomial Q, (A). The continuation 
is the same as in Sec. 20. On the other hand, the degree of the diffe- 
rence 


A (A) — AoBzta®!.B (A) 


is also strictly less than k, that is, A)B,—A*-' will be the highest- 
degree term of the matrix A-polynomial Q, (A). We see that the 
-matrices Q; (A) and Q, (A) [and also R, (A) and R; (A)] which satisfy 
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the conditions of the theorem, will indeed be distinct in the general 
case. 

Fundamental theorem on the similarity of matrices. Earlier 
we mentioned the fact that as yet we have no way of deciding whether 
two numerical matrices A and B (that is, matrices with elements 
in the base field P) are similar or not. On the other hand, their cha- 
racteristic matrices A —AH and B — AE are A-matrices and the 
question of the equivalence of these matrices is something that can 
be resolved effectively. It is therefore clear why the following theo- 
rem is of such great importance. 

The matrices A and B with elements in the field P are similar if 
and only if their characteristic matrices A — AE and B — XE are 
equivalent. 

Indeed, let the matrices A and B be similar, i.e., there is, over 
the field P, a nonsingular matrix C such that 


= C-1AC 
Then 
C3(A — AE) C = CAC — à (C7EC) = B — AE 


The nonsingular numerical matrices C1 and C are, however, unimo- 
dular A-matrices. We see that the matrix B — AE is obtained by 
multiplying the matrix A — AF on the left and on the right by uni- 
modular matrices, that is, A — AE ~ B — XE. 

Proof of the converse assertion is more complicated. Let 


A—)AE~ B—hkE 
Then there exist unimodular matrices U (A) and V (A) such that 
U (A) (A — AE) V (A) = B — AE (9) 


Taking into account that unimodular matrices have inverse matrices 
which are A-matrices, we derive from (9) the following equalities 
which will be used in the sequel: 


U (A) (A — AE) = (B — AE) V= (A) 
(A — AE) V (A) = UA) (B — AE) 

Since the A-matrix B — AE has degree 1 in A, the nonsingular 

matrix —£ serving as the leading coefficient of the corresponding 

matrix polynomial, it follows that we can apply the division algo- 

rithm to the matrices U (4) and B — AE: there are matrices Q; (A) 


and R, (the latter, if nonzero, must have degree 0 in A, i.e., it is 
independent of A) such that 


U (A) = (B — AE) Q (A) + Rı (11) 


(10) 


Similarly, 
V (A) = Q: (A) (B — AB) + R, (12) 
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Using (11) and (12), we get, from (9), 
R, (A — VE) R = (B — AE) — U (A) (A — AE) Q: (A) (B — AE) 
—(B — iE) Q, (4) (A — AE) V (A) 
+ (B — 1E) Q, (A) (A — AE) Q, (A) (B — AE) 
or, by (10), 
R, (A — VE) R; = (B — ALE) — (B — KE) V~ (A) Q, (A) (B — AE) 
—(B — VE) Q (A) U~ (A) (B — AE) 
+(B — AE) Qs (A) (A — AE) Q, (A) (B — 1E) 
=(B — VE) {E — [V (A) Qe (A) + Qi (A) U~! (A) 
— Q, (A) (A — AB) Q, (A)] (B — AB)} 
The square bracket on the right is actually zero, for otherwise, 
being a A-matrix [since both V-! (A) and U~ (A) are A-matrices], 
it would at least be of degree 0, but then the degree of the curly 
brackets would not be less than 1 and, hence, the degree of the entire 


right member would not be less than 2. But this is impossible since 
on the left-hand side we have a A-matrix of degree 1. 


Thus, 
R, (A — AE) R = B — AE 
whence, equating the matrix coefficients of identical powers of 
à we get 
R,AR, = B, (13) 
R,R, = E (14) 


Equation (14) shows that the numerical matrix R, is not only non- 
zero but is even nonsingular, and 


Rý = R 
But then equation (13) takes the form 
RIAR, = B 


which proves the similarity of the matrices A and B. 

We have at the same time learned to find the nonsingular mat- 
rix R, which transforms matrix A into matrix B. Namely, if the 
matrices A — AZ and B — AE are equivalent, then a finite number 
of elementary transformations carries the first into the second. Take 
those transformations which refer to columns; denote by V (A) the 
product of the corresponding elementary matrices taken in the 
same order. Then divide V (A) by B — AE and perform the division 
so that the quotient is on the left of the divisor [see (8)]. The remain- 
der of this division will be just the matrix R,. 

Actually, this division need not be performed; one can take >- 
advantage of the following lemma, which will also be of use in Sec. 62. 
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Lemma. Let 


V (A) = Vol? + Vat ot Vea + Veg, Vo0 (45). 
If V (A) = AE — B) Q, (A) + Ry, (16) 
V (A) = Q: (A) QE — B) + Ra 
then 
R, = BV, + BV, +... + BV + Ves (17) 
R, = VB? + VB +... + VB tT, 


It suffices to prove the first of these two assertions, because the 
second is proved similarly. The proof consists in direct verification 
of the validity of (16) if the polynomial V (A) is replaced by its nota- 
tion (15), if (17) is substituted for Ry, and if in place of Q, (A) we 
take the polynomial 
Qi A) = Von? + (BV + Vi) MO? + (B2Vy + BV, + Vp) A3 

i +... + (BV + BV, +... + Foa) 
This verification is left to the reader. 
Example. Given the matrices 
21 _ {10 —4 
a= (T03) 25 ("a6 1) 
Their characteristic matrices are equivalent since they can be reduced to one 
and the same canonical form 


e 0 ; 
0A7—A— 3) 
The matrices A and B are thus similar. 


To find the matrix R» that transforms A into B, let us find some chain 
of elementary transformations that carries A — AE into B — AE. Thus, 


—2—x 1 —2— A 1 84A —4 
A—E=( 0 aa) = (i i) as (ae i) 
40-44 —4 —10—%7 —4 
7 ( —104 ua) ( 26 ua) a 
The last two transformations refer to columns: to the first column we add the 
second multiplied by —8 and then we multiply the frst column by — = . The 


4 
product of the corresponding elementary matrices will be 


4 4 
10 f—->0) (—+0 
ra=(_ 1) (7 9)-(77 °) 


This matrix does not depend on À and therefore it is the sought-for matrix Roe. 
Of course, the matrix that transforms A into B is not by far determined 
uniquely. For example, the matrix 


34 
| (21) 
will also be of that kind. 
24—5760 _ 
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61. Jordan Normal Form 


We will now consider nth-order square matrices with elements 
in the field P. We will isolate a special type called Jordan matrices, 
and it will be shown that these matrices serve as a normal form for 
a very broad class of matrices. Namely, matrices, all the characteristic 
roots of which lie in the base field P (and only such matrices) are similar 
to certain Jordan matrices; we say that they can be reduced to a Jordan 
normal form. It will then follow, if for the field P we take the field 
of complex numbers, that any matrix with complex elements can be 
reduced to a Jordan normal form in the field of complex numbers. 

We will need some definitions. A kth-order Jordan submatrix 
referring to the number A, is a matrix of order k, 1 < k <n, of the 
form 


hb i 0 
hy 1 
DE? (1) 
„1 
Oe oh 


_In other words, one and the same number ào from the field P occupies 
the principal diagonal, with unity along the diagonal immediately 
above and zero elsewhere. Thus, , 


are, respectively, Jordan submatrices of first, second and third order. 
A Jordan matriz of order n is a matrix of order n having the form 


EA 0 
| 


J= e (2 


o [H 
The elements along the principal diagonal are Jordan submatrices 
Ji, Jo, . - +, Js of certain orders, not necessarily distinct, referring 


to certain numbers (not necessarily distinct either) lying in the 
field P. All other positions have zeros. Here, s > 1, that is to say, 
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one Jordan submatrix of order n belongs to Jordan matrices of this- 
order, and, naturally; s < n. 

It may be noted (though this will not be used in what follows) 
that the structure of the Jordan matrix can be described without 
resorting to- the concept of the Jordan submatrix. It is obvious, 
namely, that the matrix J is a Jordan matrix if and only if it has 
the form 


Aq & 0 
Ào E 
; En-4 
0. An 
where à;, i = 1, 2, ..., n, are arbitrary numbers in P and every 
; Eis = 1, 2,..., n — 1, is equal to unity or zero; note that if 


=, then is = Ajr 

" Diagonal matrices are a special case of Jordan matrices. These 
are Jordan matrices whose Jordan submatrices are of order 14. 

Our immediate aim is to find the canonical form of the characte- 
ristic matrix J — AE of an arbitrary Jordan matrix J of order n. 
We will first find the canonical form of the characteristic matrix 


Ay—A 4 0 
MwA 4 
. = 3 
. 4 (3) 
0 . Ào — À 


of a single Jordan submatrix (1) of order k. Computing the determi- 
nant of this matrix and recalling that the leading coefficient of the 
polynomial d, (A) must be equal to 1, we find that 


dh, (A) = (A — Ao)" 


On the other hand, among the (k — 1)th-order minors of the matrix 
(3) there is a minor equal to unity; this is the minor obtained by 
deleting the first column and the last row of the matrix. Therefore 


dp- (A) = 1 
From this it follows that the following kth-order \-matrix 
1 0 
i (4) 
O — (A—Ag)* 
is the canonical form of the matrix (8). 


We now prove the following lemma. 


24* 
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If the polynomials gy (A), Pa (A), ..., pe (A) from the ring P [à] 
are pairwise prime, the following equivalence holds true: 


Pı (A) . d : 
Pz (A) mE oak 
. 7 r A : 
; 0 t 
0 Pt (A) i P: (A) 


It is evidently sufficient to consider the case of t = 2. Since the 
polynomials q, (A) and gg (A) are relatively prime, there are polyno- 
mials u, (A) and u, (A) in the ring P [A] such that 


Pa (A) ua (A) + Pa (A) Us (A) = 1 


Therefore 
(e (A) 0 ) : (A) pi (A) uy a 
O ge (A) 0 P2 (A) 
ME & (A) pa (A) ui (A) + Pa (A) ug P cx (* (A) 1 ) 
0 | Pa A) Og, (A) 


~ en "0 ) es e xt m) ; 
~(0 — pi (À) P2 m) = (0 Pi (A) Q2 d) 


which is what we set out to prove. 
Let us now consider the characteristic matrix 


[7e | 0 
J,—1E, 
J—NME= |772| 


0 | Te AE: | 

of the Jordan matrix J of type (2); here, E; i = 1, 2,..., 8, 
is a unit matrix of the same order as the submatrix J;. Let the 
Jordan submatrices of the matrix J refer to the following distinct 
numbers: Ay, As, .. - At; where £ < s. Furthermore, let there refer 
to the number.A;, i = 4, 2, ..., #, qı Jordan submatrices, q; > 14, 
and let the orders of the submatrices (arranged in nonincreasing 
order) be 


(9) 


kis > Iig > +. > tg, 6) 
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Let it be noted (though we will not make use of this fact) that 


Applying elementary transformations to the rows and columns 
of matrix (5) which pass through the submatrix J; — AE, of this 
matrix, we will quite obviously not involve the other diagonal 
submatrices, whence it follows that it is possible, in matrix (5), 
to replace by means of elementary transformations every submatrix 
J; — AE; i= 1, 2, ..., s, by a corresponding submatrix of the 
type (4). In other words, the matrix J — XE is equivalent to a diagonal 
matrix, the diagonal elements of which consist (aside from a certain 
number of units) of the following polynomials which correspond to all 
Jordan submatrices of the matrix J: 


(AA, (= MYM, «(A ny" , | 
(A — AQ)", (A— day = al Ti (7) 


© 8 e è «© © 8 2 


(N— Ae) 4, (A— ic cols a 


We do not indicate the positions of the polynomials (7) on the 
principal diagonal, since the diagonal elements of any diagonal 
A-matrix can be arbitrarily rearranged by interchanging rows and 
like columns. This is worth bearing in mind for the future. 

Let q be the largest of the numbers g;, i = 4, 2, t. Denote 
by en-j;+1 (A) the product of polynomials in the jth column of array 
(7), j = 1, 2, ..., q, that is, 


t 
ena (0) = JJ Aa (8) 


If there are certain vacancies in the jth column—it may happen 
that q; < j for certain i—then the corresponding factors in (8) are 
considered to be unity. Since, by hypothesis, the numbers 
Ay, Ag, «++, At are distinct, the powers of the linear binomials in 
the jth column of array (7) are pairwise relatively prime. Therefore, 
on the basis of the lemma proved above, they can, by means of 
elementary transformations, be replaced in the diagonal matrix 
at hand by their product e,_;,, (4) and by a certain number 
of units. 
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Doing this for j = 1,.2, ..., q, we find that 


! o) 
J—1E ~  en-gri (A) (9) 
Sena (A). 
0 i end 


This is the desired canonical form of the matrix J — iE. Indeed, the 
leading coefficients of all polynomials on the principal diagonal 
of (9) are equal to unity and each of the polynomials is exactly 
divisible by the preceding one, by Condition (6). 
Example. Let 
0 


5 1 
05 


; E 
For this Jordan matrix of order 9, the polynomial array (7) is of the form 
A —2)8, A—2, —2, 
(A — 5), (A — 5)? 
Therefore, the invariant factors of the J matrix are the polynomials 
eg (A) = (A — 2)? (A — 5)’, 
eg (A) = (A — 2) (à — 5), 
e7 (A) = (A — 2) 
whereas eg (A)=... =e, (A) = 14. i 


Now that we have learned how, judging by the form of a given 
Jordan matrix J, to write down the canonical form of its characte- 
ristic matrix straightaway, we can prove the following theorem. 

Two Jordan matrices are similar if and only if they consist of the 
same Jordan submatrices, that is to say, if they differ at most solely 
in the order of these submatrices on the principal diagonal. 
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Actually, the polynomial array (7) was completely determined 
by the set of Jordan submatrices of the Jordan matrix J and did 
not in the least reflect. the arrangement of the Jordan submatrices 
along the principal diagonal of the matrix. It then follows that if 
Jordan matrices J and J’ have the same set of Jordan submatrices, 
then they are associated with one and the same array (7) of polyno- 
mials and therefore the same polynomials (8). Thus, the characte- 
ristic matrices J — AE and J’ — AE have the same invariant factors, 
that is to say, they are equivalent, and therefore the matrices J 
and J’ are similar. 

Conversely, if the Jordan matrices J and J’ are similar, then 
their characteristic matrices have the same invariant factors. Let 
the polynomials (8) for j = 1, 2, ..., q, be those invariant factors 
which are different from unity. But the polynomial] array (7) can be 
restored from the polynomials (8). Namely, the polynomials (8) 
can be factored into a product of powers of linear factors, since, 
as has already been proved, this property is possessed by the inva- 
riant factors of the characteristic matrix of any Jordan matrix. 
Array (7) just consists of all those maximal powers of the linear 
factors into which the polynomials (8) are factored. Finally, using 
array (7) we can restore the Jordan submatrices of the original Jordan 


matrices: to every polynomial (A — A,)*# of (7) there corresponds 
a Jordan submatrix of order k; that refers to the number (,. This 
proves that the matrices J and J’ consist of the same Jordan submatri- 
ces and differ at most in their order alone. 

One consequence of this theorem is that a Jordan matriz similar 
to a diagonal matrix: is diagonal and that two diagonal matrices are 
similar if and only if they can be obtained from one another by permuting 
the numbers on the principal diagonal. 

Reducing a matrix to Jordan normal form. If a matrix A with 
elements from the field P can be reduced to a Jordan normal form, 
i.e., is similar to a Jordan matrix, then, as follows from the theorem 
that was proved above, the Jordan normal form is determined uniquely 
for matrix A to within the order of the Jordan submatrices on the prin- 
cipal diagonal. The condition that allows a matrix A to be so reduced 
is given in the following theorem, the proof of which offers a prac- 
tical procedure for finding a Jordan matrix similar to A if such 
a Jordan matrix exists. Note that reducibility over the field P means 
that all the elements of the matrix undergoing transformation are 
in P. 

Matrix A with elements in the field P can be reduced over P to the 
Jordan normal form if and only if all the characteristic roots of A lie 
in the base field P itself. . 

Indeed, if matrix A is similar to the Jordan matrix J, these 
two matrices have the same characteristic roots. However, the cha- 
racteristic roots of J are easily found: since the determinant of the 
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matrix J — AE is equal to the product of its elements on the prin- 
cipal diagonal, the polynomial | J — AE | can be factored over P 
into linear factors and its roots are numbers (and only these numbers) 
on the principal diagonal of J. 

Conversely, let all characteristic roots of matrix A be in the 
field P. If the different-from-unity invariant factors of the matrix 
A — iE are ix ep oc | Ae 

€n-q+1 (A), «+ «+ €n-a (A), En (A) (10) 


| A — AE | = (—A)" en-g41 (A)... ens (A) en (A) 


Indeed, the determinants of the matrix A — AE and its canonical 
matrix can only differ in a constant factor that is actually equal to 
(—1)", since such, precisely, is the leading coefficient of the cha- 
racteristic polynomial | Á — AE |. Thus, among the polynomials 
(10) there are none equal to zero, the sum of the degrees of these 
polynomials is equal to z, and all can be factored over the field P 
into linear factors, which is due to the fact that, by hypothesis, the 
polynomial | A — AE | has such a factorization. 

Let (8) be factorizations of the polynomials (10) into products 
of the powers of the linear factors. We use the term elementary divi- 
sors of the polynomial en_j,1, j = 1, 2, ..., q, for powers (diffe- 
rent from unity) of the various linear binomials entering into its 
factorization (8), that is, 


(A — u), SEI, cia (A A Y 


We call the elementary divisors of all polynomials (10) the ele- 
mentary divisors of the matrix A and write them down in the form 
of array (7). 

Let us now take a Jordan matrix J of order n composed of Jordan 
submatrices defined as follows: with each elementary divisor 
(A — ;)"i of matrix A we associate a Jordan submatrix of order k;; 
referring to the number A,. It is evident that only the polynomials 
(40) are invariant factors, different from unity, of the matrix J — AE. 
Therefore, matrices A — AE and J — iE are equivalent and, hence, 
matrix A is similar to the Jordan matrix J. 


then 


Example. Given a matrix 

—16 —17 87 —108 

8 9 —42 54 

A= | g 8 AG i8 

\—4 —i 6 —8 
Reducing the matrix A — AE to canonical form in the usual way, we find that 
the invariant factors different from unity of this matrix are the polynomials 

e, (A) = (A — 1)? (A + 2), 

e3 (A) =A—1 : 
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We see that matrix A can be reduced to the Jordan normal form even in the 
field of rational numbers. Its elementary divisors are the poly- 
nomials (A —1)?, A—41 and A-+2 and so the matrix 


4 


O © 
Om m 
RS 
OOOD 


is the Jordan normal form of the matrix A. 
If we wanted to find the nonsingular matrix that transforms A to J, we 
would have to make use of the remarks made at the end of Sec. 60. 


Finally, on the basis of the foregoing results we can prove the 
following necessary and sufficient condition for reducing a matrix 
to diagonal form, a condition that immediately yields the sufficient 
criterion of reducibility to diagonal form that was proved in Sec. 33. 

An nth-order matrix A with elements in the field P can be reduced 
to diagonal form if and only if all the roots of the last invariant factor 
en (A) of its characteristic matrix are in P (there must be no multiple 
roots). 

Indeed, reducibility of a matrix to diagonal form is equivalent 
to reducibility to a Jordan form such that all Jordan submatrices 
have order 1. In other words, all elementary divisors of matrix A 
must be polynomials of degree one. However, since all invariant 
factors of the matrix A — AE are divisors of the polynomial e, (A), 
the last condition is equivalent to all elementary divisors of the 
polynomial e, (A) having degree one, which is what we set out to prove. 


62. Minimal Polynomials 

Suppose we have a square matrix A of order n with elements in 
the field P. If 

f (A) = AoA? + ag AP 2 +... H ARA + om 
is an arbitrary polynomial in the ring P [A], then the matrix 
f (A) = pA” -+ a,Ar1 + vee + Qp- + Ope 

is called the value of the polynomial f (A) for à = A. Note, in this 
respect, that the constant term of the polynomial f (A) is multiplied 


by the zero power of the matrix A, that is to say, by the unit matrix Æ. 
It can be verified readily that if 


f(A) = pA) + (a) 
or | . 
f (A) =u (A) v (A) 
then 
f (A) = @ (A) + Y (A) 
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and, respectively, l ARD 
f (A) = u (A) v (A) 


If the polynomial f (A) is annihilated by the matrix A, that is, 
f(A) =0 


then A will be called the matriz root or (where no confusion is pos- 
sible) simply the root of the polynomial f (A). 

Every matrix A serves as a root of some nonzero polynomial. 

We know for a fact that all square matrices of order n constitute 
an n-dimensional vector space over the field P. From this it fol- 
lows that the system of n? + 1 matrices 


AM A cs Ae 


is linearly dependent over P, that is, in P there are elements 
Qos Qis -< +) Anz, Oner1, not all zero, such that 


GAY aA ts. ahead H One. = 0 
Thus, matrix A proved to be a root of the nonzero polynomial 
P (A) = AA baat te. H AnA + nays 


whose degree does not exceed n?.. 

The matrix A is also a root of certain polynomials whose leading 
coefficients are equal to unity: it suffices to take any nonzero poly- 
nomial that can be annihilated by A and divide it by its leading 
coefficient. The polynomial of lowest degree with leading coefficient 4 
that can be annihilated by A is called the minimal polynomial of the 
matrix A. Notice that the minimal polynomial of A is uniquely defined, 
since the difference of two such polynomials would have a lower 
degree than each one separately, but it would also be annihilable 
by the matrix A. 

Any polynomial f (à) that is annihilable by the matrix A is exactly 
divisible by the minimal polynomial m (A) of this matriz. 

Actually, if 

f(A) = m (A) gq (A) +7) 
stare the degree of r (A) is less than the degree of m (A), then 
f (A) = m (A) q (A) +r (A) 


and from f(A) = m (A) =0 it follows that r(A) = 0, which 
contradicts the definition of a minimal polynomial. 

Let us prove the following theorem. 

The minimal polynomial of a matrix A coincides with the last 
invariant factor e, (A) of the characteristic matrix A — AE. 
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Proof. Retaining notations and using the results of Sec. 59, 
we can write the equation 


| (—1)" [A — AE | = dys (A) en ) (1) 
whence it follows, for one thing, that the polynomials e, (A) and 
d,-, (A) are not zero polynomials. Next, denote by B (A) the adjoint 
of the matrix A — AE (see Sec. 14), 

B (A) = (A — AE)* 
As follows from (3), Sec. 14, the equation 
(A—AF)B(A)=|A—AE|E (2) 


holds true. On the other hand, since the elements of B (A) are (n — 1)th 
order minors (with plus or minus signs) of the matrix A — AZ, 
and only these minors, and the polynomial d,_, (A) is the greatest 
common divisor of all these minors, it follows that 


B (A) = dn -1 (A) C (A) (3) 


the greatest common divisor of the elements of matrix C (A) being 
equal to 1. 
From equations (2), (3) and (1) follows the equation 


(A — AE) dns (A) C (A) = (—1)" dn-1 (A) en (A) E 


We can divide through by the nonzero factor d, -1 (A), as follows 
from the general remark that if @ (A) is a nonzero polynomial and 
D (A) = (di; (A)) is a nonzero A-matrix [let ds: (A) == 0], then the 
(s, i) position in the matrix ọ (A) D (A) will be occupied by the 
nonzero element o (A) dt (A). Thus, 


(A — AE) C (A) = (—4)" en (A) E 
whence . 
en (A) E = AE — A) [(—1)"*" C (A) (4) 


This equation shows that the remainder resulting from “left” 
division of the -matrix in the left member by the binomial AE — A 
is equal to zero. From the lemma proved at the end of Sec. 60 it fol- 
lows, however, that this remainder is equal to the matrix e, (A) E = 
= e, (A). True enough, the matrix e, (A) E may be written as a mat- 
rix A-polynomial whose coefficients are scalar matrices, i.e., such 
as commute with the matrix A. Thus, | 


én (A) = 0 > 
which is to say that the polynomial e, (A) is indeed annihilated by A. 


From this it follows that the polynomial e, (A) is exactly divisible 
by the minimal polynomial m (A) of matrix A, 


en (A) = m (A) q (A) (5) 
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It is clear that the leading coefficient of the polynomial g (A) is 
equal to unity. 

Since m (A) = 0, then, by the same lemma of Sec. 60, the remain- 
der after left-division of the A-matrix m (A) E by the binomial 
AE — A is again equal to zero; that is, 


m (à) E = (AE — A) Q (a) (6) 
The equations (5), (4) and (6) lead to the equation 
(AZ — A) (—4)! C (A)] = (AE — A) [Q (A) g (A)] 


The common factor AE — A can be cancelled out of both sides 
since the leading coefficient E of this matrix ‘polynomial is a non- 
singular matrix. Thus, 


C (A) = (—1)"** Q (A) g (A) 


We recall, however, that the greatest common divisor of the ele- 
ments of matrix C (A) is unity. Therefore, the polynomial q (A) 
must be of degree zero, and since its leading coefficient is unity, 
q (A) = 1. Thus, by (5), 

en (A) = m (A) 
which completes the proof. 

Since, by (1), the characteristic polynomial of matrix A is exactly 
divisible by the polynomial e, (A), there follows from the theorem 
just proved the Cayley-Hamilton theorem. 

Cayley-Hamilton Theorem. Every matrix is a root of its characte- 
ristic polynomial. 

The minimal polynomial of a linear transformation. Let us 
first prove the following assertion. . 

If matrices A and B are similar and if the polynomial f (A) is an- 
nihilated by matrix A, then it is also annihilated by matrix B. 

Indeed, let 

B = CAC 


If 
f(A) = aA" + aah +... + apd + op 
then | l 
aA” + a,AP1+...+ 4,14 + aE =O 
Transforming both sides of this equation by matrix C, we get 
C~! (aA? + aA +... + ara A + aE) C 
= ao (CAC)? + a, (CAC) +... + ar- (CIAC) + aE 
= aoB? + aB p... + aia B +a,F = 0 
i.e. f (B) = 0. 
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From this it follows that similar matrices have one and the same 
minimal polynomial. 

Now let ọ be a linear transformation of an n-dimensional linear 
space over the field P. The matrices that represent this transforma- 
tion in different bases of space are similar. The common minimal 
polynomial of these matrices is termed the minimal polynomial of 
the linear transformation Q: 

Using the operations (on linear transformations) introduced 
in Sec. 32, we can introduce the concept of the value of the polynomial 


f(A) = aA" + aA t-I + oe et Orah + Op 


from the ring P [A] for A equal to the linear transformation 9; this 
is the linear transformation 


f(g) = aop” ae apt +... F Akap H Ape 


where & is the identity transformation. 
We furthermore say that the polynomial f (à) is annihilated by the 
linear transformation ọ if 


f(g) = 


where is the zero transformation. 

If the reader takes into account the relationship between opera- 
tions on linear transformations and on matrices, it will be easy 
for him to prove that the minimal polynomial of the linear transfor- 
mation_@ is that uniquely determined polynomial of minimum degree 
with leading coefficient 1 which is annihilated by the transformation 9. 
After that the results obtained above, in particular the Cayley- 
Hamilton theorem, can be rephrased in the language of linear trans- 
formations. 


CHAPTER 14 


GROUPS 


63. Definition of a Group. Examples 


Rings and fields, which played so important a role in the previous 
chapters, are algebraic systems with two independent operations: 
addition and multiplication. However, there are many areas of mathe- 
matics and its application in which we very often encounter algeb- 
raic systems with only one algebraic operation defined. Thus, con- 
fining ourselves to examples that have already appeared in this 
book, we have the set of permutations of degree n (see Sec. 3) in 
which we defined the single operation of multiplication. On the 
other hand, the definition of a vector space (Sec. 8) includes the 
addition of vectors, whereas multiplication of vectors was not 
defined (notice that the multiplication of a vector by a scalar does 
not satisfy the definition—given in Sec. 44—of an algebraic opera- 
tion). 

Groups form the most important type of algebraic systems with 
a single operation. This concept has extensive applications and forms 
the subject of a whole science—the theory of groups. The present 
chapter may be regarded as an introduction to the theory of groups, 
including such elementary facts about groups as are needed by every 
mathematician and also, at the end, a theorem that is not so ele- 
mentary. 

Let us agree, as is the custom in group theory, to call the algeb- 
raic operation at hand multiplication and to use appropriate symbo- 
lism. It will be recalled (see Sec. 44) that an algebraic operation is 
always assumed to be valid and unique: for any two elements a and b 
of a given set the product ab exists and is a uniquely defined element 
of the set. 

A group is a set G with one algebraic operation that is associative 
(though not necessarily commutative); the operation must have 
an inverse. 

Because of the possible noncommutativity of the group opera- 
tion, the possibility of the inverse operation signifies the following: 
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for any two elements a and b in G there exist in G a uniquely defined 
element z and a uniquely defined element y such that 


ax =b, ya=b 


If a group G consists of a finite number of elements, then it is 
called a finite group, and the number of elements in it is the order 
of the group. If the operation defined in G is commutative, then G 
is called a commutative group or an Abelian group. 

Some simple consequences follow from the definition of a group. 
On the basis of reasoning already: given in Sec. 44, we can assert 
that the associative law permits speaking in unique fashion about 
the product of any finite number of elements of a group specified (due 
to the possible noncommutativity of the group operation) in a de- 
finite order. 

Let us examine the consequences which follow from the existence 
of the inverse operation. 

Let an arbitrary element a be given in a group G. From the 
definition of a group there follows the existence in G of a uniquely 
defined element e, such that ae, = a; thus, this element plays 
the role of unity (identity) when multiplied by element a. 
If bis any other element of G and if y is a group element satisfying 
the equation ya=b (itsexistence follows from the definition of agroup), 
we get . 
b = ya = y (aea) = (ya) ea = bea 


Thus, the element e, plays the role of a right-identity with respect 
to all elements of the group G, and not only with respect to the 
initial element a; we therefore denote it by e’. From the unambi- 
guousness implicit in the definition of the inverse operation follows 
the uniqueness of this element. 

In similar fashion, we can prove the existence and uniqueness 
in the group G of an element e” that satisfies the condition e’a = a 
for all a in G. Indeed, the elements e’ and e” coincidé since the equa- 
lities e”e’ = e” and e”e’ = e’ imply e” = e’. This proves that in any 
group G there is a uniquely defined element e satisfying the condition 

ae=ea=a 


for all a in G. This element is termed the unit (identity) element 
of G and is ordinarily denoted by the symbol 4. 

From the definition of a group there also follows the existence 
and uniqueness, for a given element a, of elements a’ and a” such that 


aa’ = 1, a"a = 1 
Actually, the elements a’ and a” coincide; from the equalities 
aaa’ = a" (aa') = a" 1 =a", 
a"aa’ = (a"a)a’ = 1-a =a’ 
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follows a” = a’. This element is called the inverse element of.a and 
is denoted by aq}, that is, | 


aa.=aa = 1 


Thus, every element of a group has a unique inverse element. 

From the foregoing equalities it follows that the inverse of the 
element a`! is the element a itself. It is readily seen that the inverse 
of a product of several elements is the product of the inverses taken 
in the opposite order: 


-1 — ,-1,-1 -1,-1 
(aa, eee An-1An) 1 =a, Ann ewe Ae a, 


Finally, the unit element is its own inverse. 

To check whether a given set with one operation is a group is 
greatly simplified by the fact that in the definition of a group the 
requirement that there be an inverse operation can be replaced by the 
assumption of the existence of a unit (identity) element and inverse 
elements (and only on one side, say, the right, and without any 
assumption about their uniqueness). This follows from the theorem 
which we will now prove. 

A set G with a single associative operation is a group if there is at 
least one element e in G with the property 


ae=a for allainG 


and if among the right-identities there is at least one element ey such 
that, relative to it, any element a in G has at least one right-inverse a7: 


aa! = eg 
Proof. Let a-! be one of the right-inverses of a. Then 
aat- = eo = Cog = ega” T 


That is, aa`™t = eọaa™!. Multiplying both sides of this equation 
on the right by one of the elements that are right-inverse for a7, 
we get aeo = eoep, whence a = eoa, since eo is a right-identity of G. 
Thus, the element e, also turns out to be a left-identity of G. Now 
if e is an arbitrary right-identity, e, an arbitrary left-identity, 
then from the equalities 


Co; = €, and ee, = e 


there follows e, = ég, i.e., any right-identity is equal to any left-iden- 

tity. This completes the proof of the existence and uniqueness, in the 

set G, of a unit element (identity) which we denote (as before) by 1. 
Furthermore, 


a`! = q7!.4 = a—aq-! 


That is, a~! =a~aa7!, where a`~lis one of the right-inverses for a. 
Multiplying both sides of the last equality on the right by one of 
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the right-inverses of a1, we get 1 = a~a, i.e., the element a~! will 
also serve as a left-inverse of a. Now, if aj is an arbitrary right- 
inverse of a, a} is an arbitrary left-inverse, then from the equalities 


a;'aa," = (a,'a) ay’ = aŭ, 

a, aa," = a, (aa;") = a," 
there follows a`} = a`}, which is to say, there follows the existence 
and uniqueness of the inverse a~t of any element a in G. 

It is now easy to show that the set G is a group. Indeed, the equa- 
tions ax = b, ya = b will be satisfied, as is readily seen, by the 
elements 

z= atb, y = ba" 


The uniqueness of these solutions follows from the fact that if, say, 
axı = azz, then, multiplying both sides of this equation on the 
left by a1, we get 2, = z The theorem is proved. 

We have already encountered the concept of an isomorphism: 
for rings, for linear spaces and for Euclidean spaces. This concept 
can be defined for groups as well, and it plays just as important 
a role in group theory as it does in the theory of rings. Groups G 
and G’ are termed isomorphic if a one-to-one correspondence can be 
established between them such that, under it, for any elements a, b 
in G and for the corresponding elements a’, b’ in G’, to the product ab 
corresponds the product a‘b’. As in Sec. 46 (for the zero element and 
the inverse element of a ring), it may be shown that, given an iso- 
morphic correspondence between groups G and G’, the unit element 
of G is associated with the unit element of G’, and if a in G is asso- 
ciated with a’ in G’, then a is associated with a’. 

Passing now to examples of groups, we notice that if the opera- 
tion in the group G is called addition, then the identity (unit) ele- 
ment of the group is zero and is denoted by 0, and in place of the 
inverse element we speak of the opposite element (additive inverse) 
denoted by —a. 

As a first instance of a group, note that, with respect to addition, 
any ring (and, in particular, any field) is a group, it is an Abelian 
group. This is the so-called additive group of a ring. This remark 
immediately yields a wealth of concrete examples of groups: the 
additive group of integers, the additive group of even numbers, 
additive groups of the rational numbers, the reals, the complex 
numbers, etc. Note that the additive groups of, integers and of even 
numbers are isomorphic with each other, although the latter is only 
a part of the former: a mapping that associates with every integer k 
an even number 2k is one-to-one and, as can easily be verified, is 
even an isomorphic mapping of the former group onto the latter. 

No ring is a group with respect to multiplication because the 
inverse operation (division) is not always possible. The situation 
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does not change if we pass from an arbitrary ring to a field, since 
division by zero does not hold in a field. However, let us consider 
the collection of all nonzero elements of a field. Since a field does 
not contain divisors of zero (that is the product of two nonzero ele- 
ments is also nonzero), it follows that multiplication is an algebraic 
operation for this set: it will be associative and commutative. The 
set of all nonzero elements of a field will be closed under division. 
Hence, the set of nonzero elements of any field is an Abelian group. 
It is called a multiplicative group of the field. Instances of such groups 
are the multiplicative groups of the rational numbers, the real 
numbers, the complex numbers. 

Obviously, all positive real numbers constitute a group with 
respect to multiplication. This group is isomorphic to the additive 
group of all real numbers: associating a real number In a with an 
arbitrary positive number a, we get a one-to-one mapping of the 
first group onto the second group; this mapping is an isomorphism 
due to the equality 


In (ab) = ln a + In b 


Let us now take the set of nth roots of unity in the field of com- 
plex numbers. In Sec. 19 we proved that the product of two nth 
roots of unity and also the inverse of an nth root of unity belong 
to this set of numbers. Since unity, quite naturally, belongs to this 
set and since multiplication of complex numbers is associative and 
commutative, we find that the nth roots of unity constitute an Abelian 
group with respect to multiplication; it is a finite group of order n. 
Thus, for any natural number n there exist finite groups of order n. 

The group (with respect to multiplication) of the nth roots of unity 
is isomorphic to the additive group of the ring Z, constructed in Sec. 45. 
Indeed, if e is a primitive nth root of unity, then all elements of the 
first of these groups is of the form ek =0,1,...,2—1. If we 
associate with every number eë an element hA of the ring Zn, i.e., 
the class of integers which yield k as remainder upon division by n, 
we get an isomorphic correspondence between the groups under 
consideration: ifO mk} <n—1, OSli<in—1 and ifk+l= 
= ng +T, where O<cren—, and q is equal to () or 1, then 
e.g! = e" and, at the same time, Ck + Cı = Cp: 

At this point, it is worth indicating some numerical sets that 
are not groups. Thus, the set of all integers is not a group with 
respect to multiplication, the set of all positive real numbers is 
not a group with respect to addition, the set of all odd numbers is 
not a group with respect to addition, the set of all negative real 
numbers is not a group with respect to multiplication. All these 
assertions can easily be verified. 

All the numerical groups considered above are of course Abelian. 
Instances of Abelian groups not made up of numbers are the linear 
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spaces: as follows from their definition (see Secs. 29, 47), any linear 
space over an arbitrary field P is an Abelian group with respect to the 
operation of .addition. 

Let us now examine examples of noncommutative groups. 

The set of all mth-order matrices over the field P is not a group 
with respect to the operation of multiplication since the demand that 
there be an inverse breaks down. However, if we confine our attention 
to nonsingular matrices, then we get a group. Indeed, the product 
of two nonsingular matrices is, as we know, nonsingular, the unit 
matrix is nonsingular; every nonsingular matrix has an inverse 
which is also nonsingular and, finally, the associative law, which 
holds for all matrices, holds true in the particular case of nonsingular 
matrices. We can therefore speak of the group of nonsingular matrices 
of order n over the field P with matrix multiplication as the group 
operation. This group is noncommutative for n > 2. 

The multiplication of permutations introduced in Sec. 3 is 
a very important example of a finite noncommutative group. We 
know that in the set of all permutations of degree n multiplication 
is an algebraic operation which is associative, although for n > 3 
it is noncommutative, that the identity permutation E is the iden- 
tity of this multiplication and that every permutation has an inverse. 
Thus, the set of permutations of degree n constitutes a group with respect 
to multiplication; it is a finite group of order n!. This group is termed 
a symmetric group of degree n and is noncommutative for 
n> 3. 

In place of the set of all permutations of degree n, let us consider 
only the set of even permutations, which, as we know, consists 


of + ni elements. Using the theorem, proved in Sec. 3, that the 


parity of a permutation coincides with the parity of the number of 
transpositions entering into some decomposition of this permutation 
into a product of transpositions, we find that the product of two even 
permutations is even. Indeed, we obtain the representation of AB 
as a product of transpositions by writing the appropriate decompo- 
sitions of A and B one after the other. Furthermore, the associativity 
of multiplication of permutations is known, and the evenness of the 
identity permutation is obvious. Finally, the evenness of the per- 
mutation A~? for the even permutation A follows at least from the 
fact that the notations of these permutations may be obtained one 
from the other by interchanging the upper and lower rows; that 
is to say, they contain an equal number of inversions. Thus, the set 


of even permutations of degree n is a finite group of order > n! with 


respect to multiplication. This group is called an alternating group 
of degree n. It is easy to verify that it is noncommutative for n > 4, 
although it is commutative for n = 3. 


25* 
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Symmetric and alternating groups play a prominent role in the 
theory of finite groups and also in the Galois theory. Notice that 
it would be impossible, by analogy with alternating groups, to con- 
struct a group of odd permutations with respect to multiplication, 
since the product of two odd permutations is always an even per- 
mutation. 

A large number of diverse examples of groups are found in the 
various branches of geometry. Just one simple example of this 
nature: the set of all rotations of a sphere about its centre is a group; 
it is noncommutative if we call the result of two successive rotations 
the product of these rotations. 


64. Subgroups 


A subset A of a group G is called a subgroup of this group if it 
is a group with respect to the operation defined in G 

To find out whether a subset A of group G is a subgroup of G, 
it is sufficient to verify that: (1) the product of any two elements 
of A lies in A; (2) A contains every element and the inverse of every 
element of A. Indeed, from the fact that the associative law holds 
in G it follows that it holds for elements in A; the fact that the unit 
element of G belongs to A follows from (2) and (í). 

Many of the groups named in Sec. 63 are subgroups of other 
groups indicated there. For instance, the additive group of even 
numbers is a subgroup of the additive group of all integers, and the 
latter, in its turn, is a subgroup of the additive group of rational 
numbers. All these groups, like the additive groups of numbers in 
general, are subgroups of the additive group of complex numbers. 
The multiplicative group of positive real numbers is a subgroup 
of the multiplicative group of all nonzero real numbers. The alter- 
nating group of degree n is a subgroup of the symmetric group of 
the same degree. 

There is a point to stress: the requirement contained in the 
definition of a subgroup that the subset A of group G be a group 
with respect to the group operation defined in G is essential. Thus, 
the multiplicative group of positive real numbers is not a subgroup 
of the additive group of all real numbers, although the former set 
is a subset of the latter. 

If we take subgroups A and B in the group G, then their intersection 
A N B, that is, the collection of elements common to A and B, is also 
a subgroup of G. 

Indeed, if the intersection A N B contains elements x and y, 
then they lie in the subgroup A and for this reason the product zy 
and the inverse z`! belong to A as well. By the same reasoning, the 
elements zy and x! belong to the subgroup B and therefore they 
are contained in the intersection A fì) B too. 
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- It is readily seen that this result holds true not only for two 
subgroups, but for any number of subgroups, whether finite or even 
infinite. 

The subset of group G consisting of the single element 1 is obvio- 
usly a subgroup of this group. This subgroup, which is contained 
in any other subgroup of G, is called the unit subgroup of group G. 
On the other hand, the group G itself is one of its own subgroups. 

An interesting example of subgroups are the so-called cyclic 
subgroups. Let us introduce the concept of the power of an element 
a of group G. If n is any natural number, then the product of n ele- 
ments equal to the element a is called the nth power of the element 
a and is denoted by a”. Negative powers of element a may be defined 
either as elements of group G inverse to the positive powers of this 
element or as products of several factors equal to the element a-}. 
These definitions actually coincide: 


(a) = (a), n> 0 (1) 


To prove this, take the product of 2n factors, of which the first n 
are equal to a-and the remaining ones are equal to a~1, and perform 
the cancellations. The element equal both to the left member and 
the right member of (1) will be denoted by a~”. Finally, let us agree 
to use the term zero power a? of element a for the element 1. 

Note that if the operation in the group G is called addition, 
then in place of powers of a we should speak of multiples of this 
element and write ka. 

It is easy to show that in any group G, we have for the powers 
of any element a for any exponents m and n (positive, negative, 
or zero) the following equalities: 


a®.q™ = a™.q™ = a, (2) 
(anr = ar (3 


We denote by {a} the subset of G composed of all powers of 
the element a, including the element a itself as its first power. The 
subset {a} is a subgroup of the group G: multiplication of the elements 
of {a} lies in {a} by (2); {a} has the element 1, equal to a°, and, 
finally, {a} contains all its elements together with all the inverse 
elements, since from (8) follows the equality 


(a”)-1 = a™” 


The subgroup {a} is called a cyclic subgroup of the group G gene- 
rated by the elementa. As is evident from (2), it is always commutative, 
even if the group G itself is noncommutative. 

Notice that it has not been asserted above that all powers of the 
element a are distinct elements of the group. If this is indeed so, 
then a is called an element of infinite order. However, let there be, 
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among the powers of a, some which are equal, say, aè = a! for 
k = 1; this is always the case for finite groups, but it may also 
occur in an infinite group as bes If k > 7, then 


bend 


which is to say that there are positive powers of the element a that 
are equal to unity. Let n be the least positive power of the element a 
equal to unity, that is, 


(1) a®=1, n>O, 
(2) ifa" =1, k>0, then kèn 


In this case we say that a is an element of finite order, namely, of 
order n. 
If an element a is of finite order n, then all the elements 


A ayp A Rar l (4) 


will be distinct, as is clearly seen. Any other power of the element a, 
whether positive or negative, is equal to one of the elements of (A). 
Indeed, if k is any integer, then, dividing k by n, we get 


k = n +r, OKrín 
and so, by (2) and (3), 
a® = (a)i -a' = a" (5) 


„ Whence it follows that if the element a is of finite order n, and 
= 4, then k must be exactly divisible by n. On the other hand, 


since 
—4 = n (—1) + (n — 1) 


it follows that for the element a of finite order n 
a~l = a™! 


Since the system (4) contains n elements, it follows from the 
results obtained above that for element a of finite order its order n 
coincides with the order (that is to say, with the number of elements) 
of the cyclic subgroup {a}. 

Finally, notice that any group has one and only one element 
of the first order: this is the element 1. The cyclic subgroup {1} 
evidently coincides with the unit subgroup. 

Cyclic groups. A group G is called a cyclic group if it consists 
of the powers of one of its elements a, that is, if it coincides with 
one of its cyclic subgroups {a}; here, the element a is called the 
generator of the group G. It is obvious that every cyclic group is 
Abelian. 

An example of an infinite cyclic group is the additive group of 
the integers—any integer which is a multiple of the number 1; 
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that is to say, this number serves as the generator of the group at 
hand. We could also take —1 for the generator. 

An example of a finite cyclic group of order n is the multipli- 
cative group of the nth roots of unity; in Sec. 19 it is shown that 
all these roots are powers of one of them, namely, the primitive root. 

The following theorem shows that, essentially, these examples 
exhaust all cyclic groups. 

All infinite cyclic groups are isomorphic among themselves; ail 
finite cyclic groups of a given order n are also isomorphic among them- 
selves. 

Indeed, an infinite cyclic group with generator a is mapped one- 
to-one onto the additive group of the integers if every element a” 
of this group is associated with the number 4; this mapping is iso- 
morphic, since, by (2), in multiplying the powers of the element a 
we add the exponents. Now if we are given a finite cyclic group G 
of order n with generator a, then we denote by e the primitive nth 
root of unity and associate with every element a? of group G, 0 < 
< k < n, the number e". This is a one-to-one mapping of the group 
G onto the multiplicative group of the nth roots of unity, the iso- 
morphic property of which follows from (2) and (5). 

This theorem enables us to speak simply about an infinite cyclic 
group or about a cyclic group of order n. 

‘We now prove the following theorem. 

Every subgroup of a cyclic group is itself cyclic. 

Indeed, let G = {a} be a cyclic group with generator a (infinite 
or finite) and let A be a subgroup of G. We assume that A is diffe- 
rent from the unit subgroup, otherwise there would be nothing to 
prove. Suppose that a” is the least positive power of a contained 
in A. There is such a power, since if A contains an element a-‘, 
s >> 0, different from 1, then A also contains the inverse element a‘. 
Assume that A also has an element at, J + 0, and k does not divide J. 
Then if d, d > 0, is the greatest common divisor of the numbers k 
and J, there exist integers u and v such that 


ku + w =d 
and therefore the subgroup A must contain the element 
(a®)*. (a!) = qt . 


but since under our assumptions d < k, we are in conflict with the 
choice of the element a”. This is proof that A = {a*}. 
Decomposition of a group with respect to a subgroup. If we 
take subsets M and N in a group G, then the product MN of these 
subsets is to be understood as the collection of elements of G that 
are representable in at least one way as the product of an element 
of M by an element of N. From the associativity of the group opera- 


392 CH. 14, GROUPS 


tion follows th» associativity of multiplication of subsets of the group, 
(MN) P = M (NP) 


One of the sets M, N may of course consist of just the one ele- 
ment a. In this case we get the product aN of the element by the set 
or the product Ma of the set by the element. 

Suppose in G we have an arbitrary subgroup A. If z is any ele- 
ment of G, then the product zA is called the left coset (of the group 
G with respect to the subgroup A) generated by element x. The element 
x naturally lies in the coset zA since the subgroup A contains a unit 
element, but z-1 = z. 

Every left coset is generated by any one of its elements, that is to 
say; if an element y lies in the coset z4, then 


yA = zA (6) 
This is true because y may be represented as 
y = za 


where a is an element of the subgroup A. Therefore, for any elements 
a’ and a” in A it will be true that 


ya’ = z (aa’), 
| za” = y (a~7a") 
which proves (6). 
From this it follows that any two left cosets of the group G relative 


to the subgroup A either coincide or do not have any element in common. 
Indeed, if the cosets xA and yA have a common element z, then 


zA = zA = yÁ 


Thus, the entire group G decomposes into disjoint left cosets 
relative to the subgroup A. This decomposition is called the left 
decomposition of the group G relative to the subgroup A. 

Note that one of the left cosets of this decomposition is the 
subgroup A itself; this coset is generated by the element 1 or, gene- 
rally, by any element a in A, since 


aå =A 


Naturally, taking the product Az as the right coset of the group 
G relative to the subgroup A—this coset being generated by the ele- 
ment x—we obtain, in similar fashion, a right decomposition of the 
group G relative to the subgroup A. For an Abelian group, both its 
decompositions (left and right) relative to any subgroup will natu- 
rally coincide, so we can simply speak of the decomposition of a 
group relative to a subgroup. 

For instance, the decomposition of the additive group of the 
integers relative to the subgroup of the multiples of the number k, 
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consists of k distinct cosets generated, respectively, by the numbers 
0, 1, 2,..., k — 1. Here, the coset generated by the number z, 
O< k —1, contains all the numbers which upon division by 
k yield the remainder 1. 

In the noncommutative case, the decompositions of a group 
relative to a subgroup may prove to be distinct. 

To illustrate, let us consider a symmetric group of degree 3, S3; 
as in Sec. 3, we write its elements as cycles. For the subgroup A 
we take the cyclic subgroup of the element (42); it consists of the 
identity permutation and the permutation (12) itself. The other 
left cosets are: (13)-A, consisting of the permutations (13) and (432), 
and (23)-A, consisting of the permutations (23) and (423). On the 
other hand, the right cosets of the group S; relative to the subgroup 
A are: the subgroup A itself, the coset A-(13), consisting of the 
permutations (13) and (423), and the coset A-(23), consisting of 
the permutations (23) and (132). We see that in this case, the right 
decomposition differs from the left decomposition. 

For the case of finite groups, the existence of decompositions 
of a group relative to a subgroup leads to the following important 
theorem. 

Lagrange’s theorem. In every finite group, the order of any sub- 
group is a divisor of the order of the group itself. 

Indeed, in a finite group G of order n let there be given a sub- 
group A of order k. We consider the left decomposition of the group 
G relative to the subgroup A. Let it consist of j cosets; the number j 
is termed the index of the subgroup A in the group G. Every left 
coset zA consists of exactly k elements, since if 


where a, and a, are elements of A, then a, = a,. Thus, 
n=kj (7) 


which completes the proof. 

Since the order of an element coincides with the order of its 
cyclic subgroup, it follows from the Lagrange theorem that the 
order of any element of a finite group is a divisor of the order of the 


group. 

It also follows from the Lagrange theorem that any finite group 
whose order is a prime number is cyclic. 

Indeed, this group must coincide with the cyclic subgroup gene- ~* 
rated by any element of it that is different from unity. 

Hence, by the above-obtained description of cyclic groups, 
it follows that for any prime p there is a unique, to within 
isomorphism, finite group of order p. 
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65. Normal Divisors, Factor Groups, Homomorphisms 


A subgroup A of a group G is called a normal divisor of this group 
(or an invariant subgroup) if the left decomposition of G with respect 
to A coincides with the right decomposition. . 

Thus, all subgroups of an Abelian group are normal divisors 
in it. On the other hand, in any group G both the unit subgroup and 
the group itself are normal divisors: both decompositions of G with 
respect to the unit subgroup coincide with the decomposition of 
the group into separate elements, and both decompositions of the 
group G with respect to the group itself consist of the single coset G. 

Here are some of the more interesting examples of normal divisors 
in noncommutative groups. In the symmetric group of degree 3, S3, 
the cyclic subgroup of element (123) consisting of the identity per- 
mutation and the permutations (123) and (132) is a normal divisor: 
in both decompositions of the group S' with respect to this subgroup, 
the second coset consists of the permutations (12), (13) and (23). 

Generally, in the symmetric group Sn of degree n the alternating 
group A, of degree n is a normal divisor. Indeed, the group A, 


is of order pa nl, and so any coset of the group S, with respect to the 


subgroup A, must consist of the same number of elements and, con- 
sequently, there is only one other such coset, namely, the collection 
of odd permutations. 

In the multiplicative group of nonsingular square matrices of 
order n with elements in the field P, those matrices whose determi- 
nants equal 1 obviously constitute a subgroup. It will even be a 
normal divisor, since the class of all matrices whose determinants 
are equal to the determinant of the matrix M is the coset (simul- 
taneously left and right) with respect to this subgroup, which coset 
is generated by the matrix M. It suffices to recall that in the multi- 
plication of matrices the determinants are multiplied together. 

The definition of a normal divisor given above may be rephrased. 

A subgroup A of group G is a normal divisor of this group if 
for any element z in G 


tA = Ax (4) 

That is to:say, for any element zx in G and an element a in A, it is 
possi to choose elements a’ and a” in A such that 

za = a' z, ax = za" (2) 


There are other definitions of a normal divisor equivalent to the 
original one. Thus, we call elements a and b of group G conjugate 
if in G there is at least one element z such that 


b = x lax (3) 
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or we say that element ’d is the transform of element a by x. From (3) 
it evidently follows that 


a = gbg! = (4-4)! ba! 


A subgroup A of a group G is a normal divisor in G if and only 
if, together with any element of it, a, it also contains all elements con- 
jugate to it in G. 

Indeed, if A is anormal divisor in G, then, by (2), for the element 
a that we chose in A and for any element z in G we can find in A 
an element a” such that 


“ 


ax = xa 
Whence 


zaz =a" 


That is, any element conjugate to a lies in A. Conversely, if a sub- 
group A contains, together with any element a, all elements conju- 
gate to a, then. in particular A also contains the element 


xoax =a" 


whence follows the second of the equalities (2). For the same reason, 
A also contains the element 


(2-4) azt = rar = a’ 


whence follows the first of the equalities (2). 

Using this result, it is easy to prove that the intersection of any 
normal divisors of group G will itself be a normal divisor of this group. 
Indeed, if A and B are normal divisors of G, then, as demonstrated 
in the preceding section, the intersection A f) B is a subgroup of G. 
Let c be any element of A (] B and z any element of G. Then the 
element x~‘cx must lie both in A and B since both of these normal 
divisors contain the element c. Whence it follows that the element 
xz ex is in the intersection A {] B. 

Factor group. The significance of the concept of a normal divisor 
is based on the fact that it is possible, in a certain very natural way, 
to construct a new group from the cosets with respect to a normal 
divisor—due to (1) there is no need in this case to distinguish between 
left and right cosets. 

First notice that if A is an arbitrary subgroup of the group G, 
then 


AA=A (4) 


since the product of any two elements of the subgroup A belongs 
to A and, at the same time, by multiplying all elements of A by 
the unit element we already get the entire subgroup A. 

Let A.now be a normal divisor of G. In this case, the product of 
any two cosets of G with respect to A (in the sense of multiplying sub- 
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sets of the group G) will itself be a coset with respect to A. Indeed, 
using the associativity of the multiplication of subsets of a group, 
and using equality (4) and 

yA = Ay 


[ef. (1)], we get 
zA -yA = zyAA = xryA (5) 


for any elements zx and y of G. 

Equation (5) shows that in order to find the product of two given 
cosets of group G with respect to the normal divisor A, we must 
choose in arbitrary fashion one representative in each coset (recall 
that every coset is generated by any one of its elements) and take 
the coset containing the product of these representatives. 

Thus is defined the operation of multiplication in the set of 
all cosets of the group G with respect to the normal divisor A. We 
will show that all the requirements that enter into the definition of 
a group are thus fulfilled. The associativity of multiplication of 
cosets follows from the associativity of the multiplication of sub- 
sets of the group. The role of the unit element is played by the 
normal divisor A itself, which is one of the cosets of the decomposi- 
tion of G with respect to A: namely, by (4) and (1), it is true that 
for any z in G, 


ztA-A = 2A, A-zA = pia = 2A 
Finally, the coset x7!A is the inverse of the coset zA since 
tA-c14=1-4=A 


The group thus constructed is called the factor group of the group 
G with respect to the normal divisor A and is denoted G/A. 

We see that every group is associated with a whole set of new 
groups—its factor groups with respect to different normal divisors. 
Here, the factor group of the group G with respect to the unit sub- 
group will, naturally, be isomorphic to G itself. 

Every factor group G/A of an Abelian group G is itself Abelian, 
since from zy = yz it follows that 


xA yA = xyA=yrA = yA-xA 


Every factor group G/A of a cyclic group G is cyclic, because if G 
is generated by an element A G = {8}, and if we are given an 
arbitrary coset x4, then there is an integer k such that. 


zr = g* 
and so 
zA = (gA)* 


The order of any factor group G/A of a finite group G is a divisor 
of the order of the group itself. Indeed, the order of the factor group 
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G/A is equal to the index of the normal divisor A in the group G, 
and so we can take advantage of (7) of the preceding section. 

Here are some instances of factor groups. Since, in the additive 
group of the integers, the subgroup of multiples of the natural num- 
ber k has, as shown in the preceding section, index k, the factor 
group of our group with respect to this subgroup is a finite group 
of order k; it is a cyclic group because the group under consideration 
is itself cyclic. 

The factor group of a symmetric group S,, of degree n with respect 
to an alternating group A, of degree n is a group of order 2; because 
2 is prime, it is a cyclic group (see the end of the preceding section). 

We have already given a description of the cosets of the multi- 
plicative group of nonsingular matrices of order n with elements 
in the field P with respect to the normal divisor composed of matrices 
whose determinants are equal to 1. From this description it follows 
that the corresponding factor group is isomorphic to the multiplica- 
tive group of nonzero numbers of P. 

Homomorphisms. The concepts of a normal divisor and a factor 
group are closely connected with the following generalization of 
the concept of an isomorphism. 

A mapping ọ of a group G onto a group G’ such that to every 
element a of G there corresponds a unique element a’ = aq in G’ 
is called a homomorphic mapping of G onto G’ (or simply a homo- 
morphism) if in this mapping every element a’ of G’ is an image of 
some element a in G, a’ = aq, and if for any elements a, b of G, 


(ab) p = ap-be 

It is quite obvious that if we also required a one-to-oneness 
of the mapping g, we would obtain the already familiar definition 
of an isomorphism. 

If @ is a homomorphism of group G onto group G’ and 1 and a are, 
respectively, the unit element and an arbitrary element of G and, 1' 
is the unit element of G’, then 

ip = 1’, 
(a~) @ = (aq)™ 


Indeed, if ig = e’ and 2’ is an arbitrary element of the group G’, 
then there is an element x in G such that zp = x’. Whence, 


x = zp = (t:1)9 = rọ- 1p = r'e 
Similarly, T 


and, hence, e’ = 1’. 
On the other hand, if (a-t) ọ = 0’, then 


T = 1p = (aa-!) p = ap- (a7) pọ = ag-b' 
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and, similarly, 
1' = b'ag 
whence b = (ag)7?. 

Let us use the term kernel of a homomorphism ọ of a group G 
onto a group G’ for the set of elements of G which are mapped under 
p into the unit element 1’ of G’. 

The kernel of any homomorphism q of a group G is a normal divisor 
of G. 

Indeed, if the elements a, b of G enter into the kernel of the 
homomorphism ọ, i.e., 


then 
(ab) p = ag-bg = 1'-1' = 1' 


That is to say, the product ab is also contained in the kernel of the 
homomorphism gq. On the other hand, if am = 1’, then 


(a>) p = (ag) = 1 = 1! 


which is to say that a~! is also in the kernel of the homomorphism 9. 
Finally, if ap = 1’, and z is an arbitrary element of the group G, 
then 


(zax) p = (17) p-ag-zp = (zg) -1' -ro = i 


The kernel of the homomorphism under consideration turned out to 
be a subgroup of the group G, which contains all the elements con- 
jugate to any one of its elements; hence, it is a normal divisor. 

. Now let A be an arbitrary normal divisor of the group G. Asso- 
ciating every element z of G with that coset +A with respect to the 
normal divisor A in which the element lies, we obtain a mapping 
of the group G onto the entire factor group G/A. From the definition 
of multiplication in the group G/A [see (5)i, it follows that this 
mapping is homomorphic. 

The resulting homomorphism is the canonical homomorphism 
of the group G onto the factor group G/A. The normal divisor A 
is itself obviously the kernel of this homomorphism. 

From this it follows that only the normal divisors of the group 
G serve as kernels of the homomorphisms of this group. This result can 
be regarded as yet another definition of a normal divisor. 

It appears that all groups onto which the group G can be homo- 
morphically mapped are actually exhausted by the factor groups 
of this group, and all the homomorphisms of G are exhausted by its 
canonical homomorphisms onto its factor groups. To be more precise, 
the following theorem holds. 

Theorem on homomorphisms. Suppose we have a homomorphism 
~ of a group G onto a group G'; let A be the kernel of this homomorphism. 
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Then the group Œ is isomorphic to the factor group G/A; there exists 
an isomorphic mapping o of the former of these groups onto the latter 
such that the result of the successive mappings ọ and o coincides with 
the canonical homomorphism of the group G onto the factor group G/A. 

Indeed, let x’ be an arbitrary element of G’, and let z be an ele- 
ment of G such that xp = z’. Since for any element a of the kernel 
A of the homomorphism ọ we have the equality ag = 1’, it follows 
that 

(za) ọ = zep-ag = gU =2' 


That is, all elements of the coset xA are mapped under ọ into the 
element 2’. 

On the other hand, if z is any element of the group G, such that 
zp = z’, then 


(2-72) p = 2-29 = (zp) zp = rir = 1 


That is to say, x~1z is contained in the kernel. A of the homomorphism 
p. If we set 2-12 = a, then z = xa, or the element z is contained in 
the coset xA. Thus, collecting all the elements of the group G which 
are mapped under the homomorphism 9 into the fixed element 2’ 
of the group G', we get precisely the coset zA. 

The correspondence o, which associates every element x’ of G 
with that coset of G by the normal divisor A which consists of all 
elements of G having z’ as its image under ọ, is a one-to-one mapping 
of the group G’ onto the group G/A. This mapping ø is an isomorphism 
since if . 
vo = rÅ, yo=yA 


that is, 
zọ =x, yoy’ 
then 
(zy) p = zp yp = wy’ 
and so 


(x'y') o = zyA = ctA-yA = z'o-y'o 
Finally, if x is an arbitrary element in G and zp = x’ then 
(xg) 0 = z'o = 2A 
That is, a successive execution of the homomorphism @ and the iso- 


morphism o indeed maps the element z into the coset zA generated 
by it. The theorem is proved. 


66. Direct Sums of Abelian Groups 


We would like to conclude this chapter with a group-theoretic 
theorem that is deeper than the elementary properties of groups given 
above. Namely, proceeding from the description, given in Sec. 64, 
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of cyclic groups, we will obtain in the next section | a complete des- 
cription of finite Abelian groups. 

‘As is customary in the theory of Abelian groups, we use the 
additive notation for the group operation: we shall speak of the 
sum a + b of elements a and b of the group, of the zero subgroup 0, 
of the multiples ka of some element a, etc. 

We will examine in this section a construction that will be 
described in detail in application to Abelian groups, though it could 
have been introduced at once for arbitrary (that is, not necessarily 
commutative) groups. This construction is suggested by the follow- 
ing examples. A plane regarded as a two-dimensional real linear 
space is an Abelian group with respect to the addition of vectors. 
Any straight line in this plane passing through the coordinate origin 
is a subgroup of the indicated group. If A, and A, are two distinct 
straight lines of this kind, then, as we know, any vector in the plane 
that issues from the origin is uniquely represented by the sum of its 
projections on the straight lines A, and A,. Similarly, any vector 
of three-dimensional linear space can be uniquely written as the 
sum of three vectors belonging to three given straight lines A,, 4», 
and AÁ}, provided the lines do not lie in the same plane. 

An Abelian group G is called the direct sum of its subgroups A,, 
Ag: 32356 Aia 


G=A,+A,+... +A, . (1) 


if every element z of G is uniquely written as the sum of the elements 
ai, ao, ..., Ap, taken, respectively, in the subgroups A;, Ay, ..., Ar 


xz = a tH a, +H... Far (2) 


The notation (1) is called the direct decomposition of the group 
G, the subgroups A,;, i = 1, 2,..., k, are direct summands of this 
decomposition, and the element a; in (2) is a component of the ele- 
ment x in the direct summand A; of the decomposition (1), i = 

= Å, 2, , k. 

I f we are , given a direct decomposition (1) of a group G and if the 
direct summands A, of this decomposition (all or some of them), are 
themselves decomposed into a direct sum, 


A; = Ai + Ais -+ e. + Aik,» k; >41 (3) 
then the group G is the direct sum óf all its subgroups: ` 
Ajj, 7 =A, De Seed kj, t= 41; Brasno k 


Indeed, for an arbitrary element z of G we have the notation (2) 
relative to the direct decomposition (4), and for each component a&;, 
i = 41, 2,..., k, we have the notation 


a; = Ay, + Aig fb... 1 Ain, f (4) 
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relative to the direct decomposition (3) of the group A;. It is clear 
that x is the sum of all the elements a;;, j= 1, 2, ..., ki, i= 
ap ee , k. The uniqueness of this notation follows from the 
fact that we must obtain precisely equality (2) by taking any nota- 
tion of the element x as a sum of elements, taken one each in the 
subgroups A;;, and by adding the summands belonging to the same 
subgroup A;, i = 1, 2,..., k. On the other hand, each element a; 
only has one notation of the type (4). 

The definition of a direct sum may be restated. First let us intro- 
duce a new concept. If it is given that an Abelian group G has certain 
subgroups B;, B,,..., Bu, then we denote by {B;, By, ..., By} the 
set of elements y of G which can in at least one way be written as 
a sum of the elements b4, bs, . . ., ba, taken in the subgroups B4, Bo, ... 

., Bi, respectively, 


y =b +b +... +b; (5) 


The set {Bi, Ba, ..., Bi will be a subgroup of G. We say that 
this subgroup is generated by the subgroups B,, Be cat eg SA iy 

For the proof, let us take in {B,, B», ..., Bı} an element y 
with notation (5), ane also an element y’ with ‘a similar notation, 


=b +b +... +0; 
where y, is an Ea in B, i= 1, 2, 3 L. Then 
PEA ere eee re) 
—y = (—by) + (—b,) +... + (—b:) 


which is to say that the elements y + y’ and —y also have at least 
one notation of the type (5) and, hence, belong to the set {B,, Bap... 

, Bı}, which completes the proof. 

The subgroup {B1, Bo, ..., Bi} contains each of the subgroups 
B; i= 1, 2, ..., L Indeed, every subgroup of the group G con- 
tains the zero element of this group and so, taking, for instance, 
in the subgroup B, any element b,, and in the subgroups B.,..., Bi 
the element 0, we obtain the following notation of type (5) for ele- 
ment 0,: 


bi =b FOF... +0 


An Abelian group G is the direct sum of its subgroups A1, Ay,..., Ap 
if and only if it is generated by these subgroups, 


Gi fA Aa A o (6) 


and if the intersection of each subgroup A;, i = 2, ..., k, with the 
subgroup generated by all preceding subgroups Á Az, ..., Aia 
coniains zero alone: 


{A,, As e.. A;_4} N A; aed Q, i= 2, v.3 k (7) 


26—5760 
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Indeed, if the group G has the direct decomposition (4), then 
for any element x of G the notation (2) exists, and therefore we have 
equation (6). The validity of equations (7) follows from the uniqueness 
of the notation (2) for any element zx: if for some i the intersection 
{A1, Ao, ..., Aia} N A; contained a nonzero element z, then, 
on the one hand, z could be written as an element a;, in A;, i.e., 
x = a;, and so 


z=0+...+0+4,+0+...+0 (8) 
On the other hand, x, as an element of the asad {Ay, Ag,..., Ai-s}, 
would have a notation of the form 
z = a F ay F... + ay 
which is to say that 
z= Haa +... H-a +0+...4+0 (9) 


It is evident that (8) and (9) are two distinct notations of type (2) 
for the element z. 

Conversely, let (6) and (7) hold. From (6) it follows that any 
element z of G has at least one notation of type (2). However, let 
there be two distinct notations of type (2) for some element z: 


T= +a t... +a =a tat... + ap (10) 
Then we can find an i, i < k, such that 


ah =h, Ap-~=A_4, «++, lipi = lipi (11) 
but 
a; a 
That is, 
a; — a, ~0 (12) 


From (10) and (11) follows, however, the equality 
a; — a = (a, — a) + (a, — a) +... + (Gi — ai) 


which contradicts (7) due to (12). The theorem is proved. 
The concept of a direct sum may be regarded from quite a diffe- 
rent angle. Suppose we have k arbitrary Abelian groups A1, Ag, .. 
, A, among which there may be isomorphic groups. Denote 
by G the set of all possible systems of the form 


(ay, Aa, s e s ap) (13) 


composed of elements taken one at a time in each of the groups 
Ai, Ag, ..., Ay. The set G will become an Abelian group if addi- 
tion of the systems of type (43) is defined by the following rule: 
(a4, dg, «~~, ap) + (ais a, ..., ap) 


= (a, + a, ag + a, ..., ay + ah) (14) 
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That is, the elements are combined separately in each of the given 
groups Ay, Az, ..., An. Indeed, the associativity and commutativity 
of this addition follows from the validity of these properties in 
each of the specified groups; the role of zero is played by the system 


(Oi, Ons... Ox) 


where 0; denotes the zero element of the group A;, i = 1,2, ..., k. 
The inverse of (13) is the system 


(—a,, —@,, ..., —@z) 
The Abelian group G thus constructed is called the direct sum 
of the groups A;, A» ..., A, and is written, as above, 
G = Ai + Á, +... + A, 


This name is justified by the fact that the group G, which is the direct 
sum of the groups A1, Ag, ..., Ap in the sense just defined, can be de- 


composed into the direct sum of its subgroups Aj, Aj, ..., Ak, which 
are isomorphic, respectively, to the ae Ay, Ag, .. he 
Namely, denote by Aj, i = 1, 2, , k, the set of élements of G, 


that is systems of type (43), with an ‘arbitrary element a; of group 
A; in the ith position, all other positions being occupied by zeros 
of the corresponding groups; these will thus be systems of the form 
(01, eid as 0; mt) is Oiti peia 0,) (15) 
The definition (14) of addition shows that the set Aj is a subgroup 
of the group G. We obtain the isomorphism of this subgroup and the 
group A; by associating to each system (15) an element a; of group A;. 
It remains to prove that the group G is the direct sum of the 
subgroups Aj, A, ..., An. Indeed, any element (13) of G may be 
represented as a sum of elements of the indicated subgroups: 
(as, Gay». +) Gy) = (G4, Og, . . -, Op) 
T (0, la, 03, a) On) + sis ae (04, 0z, E Bai On -15 ap) 
The uniqueness of this representation follows from the fact that 
distinct systems of type (15) are distinct elements of the group G. 
If we have two systems of Abelian groups, A1, As, ..., Ap and 


eee . ., By, and the groups A; and B; are isomorphic, i = = 2,. 
ing Er then the groups 


G = Ai t Aat... +A, 
and 7 
H = B, + Ba +... + Bp 
are also isomorphic. 


Indeed; if for i = í, 2, ..., k there is established, between 
groups A; and B;, an isomorphism qg;, which associates with each 


26* 
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element a; of A; an element a;9; of B;, then the mapping g, which 


associates with every element (a4, a,, . . ., ap) of G an element of H 
defined by the equation 
(di, Ag... Ge) P = (Pis GPa, -> > An Pr), 


will obviously be an isomorphic mapping of the group G onto the 
group H. 
If we have finite Abelian groups Ay, As, . . ., Ax of orders ny, Ng, ... 
.» Np, respectively, then the direct sum G of these groups is also a 
finite group and its order n is equal to the product of the orders of the 
direct summands, . 


Nn = Nig... Np (16) 


Quite true, since the number of distinct systems of type (43) 
whose element a, can assume 7, distinct values, whose element a, 
can assume n, distinct values, and so on, is determined by equa- 
tion (16). 

Let us consider some examples. 

If the order n of a finite cyclic group {a} can be decomposed into 
the product of two relatively prime natural numbers, 

n=st, (s,t) =1 


then the group {a} can be decomposed into the direct sum of two cyclic 
groups having orders s and t, respectively. . 
Let us use the additive notation for the group {a}. If we set 
b = ta, then 
sb = (st) a = na = 0 
but for 0< k <s 
kb = (kt) a 0 
which is to say that the cyclic subgroup {b} is of order s. Similarly, 
the cyclic subgroup {c} of element c = sa has order t. The inter- 
section {b} N {c} contains only zero because if kb = le for 0 < k < 
<s, 0< l< t, then 
(kt) a = (Is) a 
whence, since the numbers ki and 7s are less than n, 
kt = ls 
which is impossible due to the relative primality of the numbers s 
and ¢. Finally, there are numbers u and v such that 
su+wv=t1 
and so 
a = v (ta) + u (sa) = vb + uc 
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and, consequently, any element of the group {a} may be represented 
as the sum of elements of the subgroups {b} and {c}. 

We call an Abelian group G indecomposable if it cannot be de- 
composed into the direct sum of two or several of its subgroups dis- 
tinct from the zero subgroup. A finite cyclic group whose order 
is some power of the prime number p is called a primary cyclic group 
relative to the prime number p. Applying several times the assertion 
proved above, we find that any finite cyclic group can be decomposed 
into the direct sum of primary cyclic groups relative to distinct prime 
numbers. More precisely, a cyclic group of order 


— phink k 
n = přpP; ... pss 


where Di, Pa, -- ., ps are distinct prime numbers, can be decomposed 
into the direct sum s of cyclic groups having orders p™, pë? ,.. ., ps, 
respectively. 


Every primary cyclic group is indecomposable. 

Indeed, suppose we have a finite cyclic group {a} of order p*, 
where p is prime. If this group were decomposable, then, by (7), 
it would have nonzero subgroups whose intersection is zero. Actual- 
ly, however, every nonzero subgroup of our group contains the non- 
zero element 


b= p* a 
To prove this, take an arbitrary nonzero element x of our group, 
p= sa, <s p" 
The number s may be written as 
$= p's’, 0 < l< k 


where the number s’ is not divisible by p and, hence, is relatively 
prime to it; and so there exist numbers u and v such that 
su + p =1 
Then 
(p*-" tu) £ = (p 
= p™™! (1 — pv) a = (p+ — p*v) a = p™a — v (p*a) = p" ta =b 


ř-l-iųs) a = (p"-tus') a 


which is to say, the element b is in the cyclic subgroup {z}. 

The additive group of the integers (which is an infinite cyclic group) 
and also the additive group of all rational numbers are indecomposable 
groups. 

The indecomposability of both these groups follows from the 
fact that in each of them there exists, for any two nonzero elements, 
a nonzero common multiple; that is, any two nonzero cyclic sub- 
groups have a nonzero intersection. 
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Note that if the operation in an Abelian group G is termed mul- 
tiplication, then instead of a direct sum we speak of a direct product. 

The multiplicative group of nonzero real numbers can be decom- 
posed into a direct product of the multiplicative group of positive real 
numbers and a group, with respect to multiplication, made up of the 
numbers 1 and —1. 

Actually, the intersection of these two subgroups of our group 
contains only the number 1—the unit element of this group. On the 
other hand, every positive number is the product of the number 4 
by itself, every negative number is the product of its absolute value 
by the number —1. 


67. Finite Abelian Groups 


If we take any finite set of primary cyclic groups, some of which 
can refer to one and the same prime number or even have the same 
order, i.e., be isomorphic, then the direct sum of these groups is 
a finite Abelian group. It turns out that this exhausts all finite 
Abelian groups. 

Fundamental theorem of finite Abelian groups. Every finite 
Abelian group G which is not a zero group can be decomposed into a 
direct sum of primary cyclic subgroups. 

We begin the proof of this theorem with the remark that in the 
group G there will inevitably be nonzero elements of prime power orders. 
Indeed, if some nonzero element x of G has order Z, lx = 0 and if 
p”, k > 0, is a power of the prime p such that divides the number l, 


l = p*m 
then the element mz is different from zero and has order p". 
Let 
Pis Par + +s Ds (1) 


be all distinct primes, some powers of which serve as the orders of 
certain elements of the group G. Denote any such number by p 
and the set of elements of G having powers of p as their orders by P. 
The set P is a subgroup of the group G. Indeed, P includes the 
element 0 since its order is 1 = p’. portaeren if přx = 0, then 
p” (—x) = 0 as well. Finally, if pr= 0, p'y = O and if, say, 
i > l, then 


pety =0. 


Thus, either the number p" or a divisor of this number, at any rate 
some power of p, serves as the order of the element z + y. 

Alternately taking each of the numbers (4) for p, we obtain s 
nonzero subgroups 


Py, Paca Py (2) 
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The group G is the direct sum of these subgroups, 
G = Pi + P+... +P, = (3) 


True, for if x is an arbitrary element of G, then its order J can 
only be divisible by certain prime numbers of the system (1), 


l= php... pes 


where k; > 0, i=1, 2, ..., s. Therefore, as was demonstrated 
at the end of Sec. 66, the cyclic subgroup {z} can be sh eae into 
the direct sum of primary cyclic subgroups having orders při, phe, .. 
Puta pis, respectively. These primary cyclic subgroups lie in corres- 
ponding subgroups (2) and, consequently, the element x is repre- 
sented in the form of a sum of elements taken one each in all or 
several of the subgroups (2). This proves the equality 


G = {P,, | eee ee P,} 


which is similar to (6) of Sec. 66. , 
To prove the equality similar to (7) of the same section, take 
any i, a <i <s. Then any element y of the subgroup {P;, Pa... 
Pj. F is ol the form 


Yy =u +a t... Fa 


where the element a;, j = 14, 2, ..., i — 4, is in the subgroup Pj, 
that is, has order p’. Then, 


(Ppp? -pip y=0 
For the order of the element y we have some divisor of the number 


př pł... pis! and, consequently, the element y, if it is diffe- 


rent from zero, cannot be in the subgroup P;. This proves that 


{Pas Pa, e. Pia} N P; =0 
which is what we set out to prove. 

Notice that an Abelian group, the orders of all the elements of 
which are powers of one and the same prime number p, is termed 
primary relative to p. Primary cyclic groups are a special case of 
primary groups. Thus, the subgroups (2) are primary. They are called 
primary components of the group G, and the direct decomposition (3) 
is called the decomposition of this group into primary components. 
Since the subgroups (2) are defined uniquely in the group G, it follows 
that the decomposition of G into primary components is likewise defined 
uniquely. 

Quite naturally, the decomposability of any finite Abelian group 
into the direct sum of primary groups reduces the proof of the fun- 
damental theorem to the case of a finite primary Abelian group P 
relative to some prime number p. Let us consider this case. 
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Let a, be one of the elements of the group P having the highest 
order in it. Furthermore, if in P there are nonzero elements, the 
intersection of the cyclic subgroups of which with the cyclic sub- 
group {a,} is zero only, then by a, we denote one of the elements of 
the highest order among the elements with this property; thus, 


{ay} N {a} = 0 


Let the elements ai, d,, ..., @-, be already chosen. Denote 
by {@4, ao, ..., Qi} the subgroup of the group P generated by 
their cyclic subgroups: 


{a}, {a} E E {a:-1}} = {a4, Boy >». oy ai) (4) 
It evidently consists of all the elements of P that can be written as 
the sum of multiples of the elements a4, a,, ..., @; 4. We will say 


that this subgroup is generated by the elements a,, a,, ..., Qi -4. 
Let us now denote by a; one of the elements of the highest order 
among those elements of P whose cyclic subgroups have a zero 


intersection with the subgroup {a,, @,, ..., a; 4}. Thus 
{a;, Qo, .. ey ai} N {a;} = 0 (5) 
Because of the finiteness of the group P, this process must ter- 
minate. Suppose this occurs after the elements ai, az, ..., @, have 


been chosen. If by P’ we denote the subgroup generated by these 
elements, 

P’ = {a4, ag... Gs} 
i.e., 


P’ = {{ay}, {a} ..., {as}} _ (6) 


then, consequently, a cyclic subgroup of any nonzero element of the 
group P has a nonzero intersection with the subgroup P’. 

The equality (6) and the equality (5), which holds true for 
, 3, ..., s, Show that, by (4), the subgroup P’ is the direct 
sum of the cyclic subgroups {a,}, {aa}, ..., {a}, 


P' = {a} + {a} +... + {a,} (7) 


It remains to prove that the subgroup P’ does indeed coincide with 
the entire group P. 
Let x be any element of P having order'p. Since 


P’ N {x} £0 


and the subgroup {z} has no nonzero subgroups different from it- 
self—recall that the order of a subgroup is a divisor of the order 
of the group, and the number p is prime—the subgroup {z} is indeed 
contained in the subgroup P’ and, hence, z belongs to P’. Thus, all 
elements of order p of the group P lie in the subgroup P’. 


i= 
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Now suppose it has been proved that all elements of P whose 
order does not exceed the number p*-! are in the subgroup P’, and 
let z be any élement of P having order p”. As the choice of the ele- 
ments i, dy, ..., as Shows, their orders do not increase and so we 
can indicate an i, 1 <i — 1 < s, such that the orders of the elements 
lis Qo, ..., Q; 4 are greater than or equal to p*, and fori—1i<s 
the order of the element a; is strictly less than this number, that 
is to say, less than the order of the element z. Whence it follows, 
by the conditions to which the choice of the element a; are subject, 


that if 
Q = {a;, Qo, P a;-1} 


Q N {x} 40 


However, in Sec. 66 it was proved that any nonzero subgroup 
of a primary cyclic group {x} of order p° contains the element 


y = p* tx (8) 
Consequently, the element y lies in the intersection Q f {x} and 


therefore in the subgroup Q as well. This enables one to write y 
as the sum of multiples of the elements a, a,, ..., @;-4, 


y = Lay + lhd +... + lili (9) 
From (8) it follows that the element y has order p. Therefore, 
(ply) ay + (Pla) ag +... + (plj-1) ti- = 0 


That is to say, because of the existence of the direct decomposi- 
tion (7), 


then 


(plj)a; =0, j=i,2,...,i-—1 

The number pl; must thus be divisible bY the order of the element a; lj, 
and therefore also by the number p*, whence it follows that p” 
divides 1;: 

= p*m,, j=i,2,...,i—1 (10) 


Let 
Z = M + Moy T... F Mi-1đi-1 


This will be an element of the subgroup Q and therefore of the sub- 
group P’ too; by (9) and (40), 


y = pz (11) 
From (8) and (41) follows the equality 
pti (z — z) = 0 
That is, the order of the element 
t = z — z 
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does not exceed p*-! and, hence, by the induction hypothesis, ¢ is 
contained in the subgroup P’. Therefore, element zx as the sum of 
two elements of P’, x =z + 1, also belongs to the subgroup P’. 
This is proof that all elements of order p° of the group P are con- 
tained in P’. 

Consequently, our inductive proof admits of the assertion that 
all elements of the group P enter into the subgroup P’, or P’ = P. 
This concludes the proof of the fundamental theorem. 

Collaterally, we have that a finite Abelian group is primary rela- 
tive to a prime number p if and only if its order is a power of p. True 
enough, it was shown that any finite primary (with respect to p) 
Abelian group P can be decomposed into the direct sum of primary 
(with respect to p) cyclic groups, and for this reason the order of the 
group P is equal to the product of the orders of these cyclic groups, 
that is to say, itis a power of p. Conversely, if a finite Abelian group 
has order pë, where p is prime, then the order of any one of its ele- 
ments is a divisor of this number, that is, it is also some power of p, 
and therefore the group turns out to be primary relative to p. 

The fundamental theorem does not yet exhaust the problem of 
a complete description of finite Abelian groups, since we have not 
precluded the possibility that the direct sums of two distinct sets 
of cyclic groups that are primary relative to certain prime numbers 
may prove to be isomorphic groups. peManly: this does not occur, 
as the following theorem shows. 

If a finite Abelian group G is decomposed in two ways into a direct 
sum of primary cyclic subgroups, 


G = {a} + {a} +... + {a} = {b} + fb +... + {bi} (12) 


then both direct decompositions have one and the same number of direct 
summands, s = t, and it is possible to establish a one-to-one corres- 
pondence between these decompositions such that the appropriate sum- 
mands are cyclic groups of the same order, which is to say they are iso- 
morphic. 

Note, to begin with, that if, say, in the first of the direct decom- 
positions (12), we collect direct summands relative to a given prime 
p, then their direct sum will be a primary (relative to p) subgroup 
of the group G and even a primary component of this group, since 
its order is equal to the highest power of p that divides the order 
of the group G. Thus combining the direct summands in each of the 
decompositions (42), in both cases we obtain a decomposition of G 
into primary components, the uniqueness of which decomposition 
has already been noted above. 

This permits proving our theorem under the assumption that 
the group G is itself primary relative to the prime number p. Let the 
numbering of the direct summands in each of the decompositions 
(12) be chosen so that the orders of these summands do not increase, 
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that is, the elements a,, a,, ..., a, have, respectively, the orders 


pt, pr, ..., prs 
for 


ki > k, >... >k 
while the elements b, b ..., b; have the orders 


L 
p™, p?, ..., pt 
for 


W> lh >.. h 


If the assertion of our theorem were not valid, then there would 
be an i, i > 1, such that 


khi = l, a kia = lia (13) 
but 
k; Æ 
Naturally, i < min (s, #), since for each of the decompositions (12) 


the product of f the orders of all direct summands is equal to the order 
of the group G. We will show that our assumption leads to a contra- 
diction. 
For example, let 
k; < l (14) 


Denote by H the set of elements of the group G whose orders do not 
exceed p*. This is a subgroup of the group G, since if x and y are 
elements of H, then both z+ y and —z have orders that do not 
exceed the numbers p* 
Note that the subgroup H contains, for instance, the following 
elements: 
piia, D las, ee při- iai, Qi, Qitis - ++) As 
On the other hand, if 1 <j <i — 1, then the element p*s—*i-‘1q, 
has order p*i+! and therefore is not in H. From this it follows that 
the coset a; + H (recall that we are using the additive notation!) 
has, as an element of the factor group G/H, the order p*s—*i, Such 
also is the order of its cyclic subgroup {a; + H}. We will now prove 
that the group G/H is the direct sum of the cyclic subgroups {a; + 
+ A}, j=, 2... i—i, 
GIH = {a, + H} + {a, + H} +... + {a;4 + B} (15) 
and so its order is equal to the number 
on a pe: -+i 4-3) (16) 


lf z is an arbitrary element of the group G, then there exists the 
notation 
x= ma, + ma, +... + ma, 
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Suppose for j = 1, 2,..., i— 1, 

m = p "ig; +n; 
where 

O<nj< phim’ 4 (17) 
Then 

mja; =q; (p's *tay) + nja; 

and since the first summand of the right member is contained in H, 
it follows that 


mya; + H = na; + H 
On the other hand, o 
ma, + H = H, ..., mas + H =H 

And so 
x + H = (ma, + H) + (ma, + H) +... + (m,a; + H) 

= (nya, + H) + (mga, + H) +... + (niati a + H) (18) 

Let there also be the notation 

a+ H = (nia, + H) + (na + H) +... + (ni_iaj-1 + H) (19) 


where 


O<nia< py, fat, 2,...,i-1 o (20) 
Then the elements | 
Nili F Ngä +... F Ny-4Qj-4 
and 
nia, F nag H... + Ni 10-4 


lie in one coset relative to H, i.e., their difference belongs to H and 
therefore 


p™ [(ny— ni) a, + (n — N) az + s. (eons ) aiz] = 0 


From this it follows [since the first of the ee a (12) is 
direct] that 


pi (nj—ni) a;=0, j=1, 2, we, b=A 


and so the number p*i (n; — nj) must be divisible by the order p*s 
of the element a; and, hence, the difference n; — nj is divisible by 
the number pii, Whence, by (17) and (20), it follows that 

| ny = Nj, j=1,2,...,i-1 


which means that the notations (48) and (19) are identical. This 
proves the existence of the direct decomposition (415). 
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Analogous arguments relative to the second of the direct decom- 
positions (12) will show that this same factor group G/H has the 
direct decomposition 


GIH = {b, + H} + {ba H H} +... + {b1 HH} + {bi HH... 


That is, by (43) and (14), its order must be strictly greater than the 
number (16). This contradiction proves the theorem. 

We have thus obtained a complete survey of the finite Abelian 
groups. Namely, we take all possible finite sets of the natural numbers 


(Nis Rgs © -es Mp) 


different from unity, but not necessarily distinct; each one of these num- 
bers must be a power of some prime number. To each such set we asso- 
ciate the direct sum of cyclic groups whose orders are numbers from 
this set. All the finite Abelian groups thus obtained are pairwise noni- 
somorphic, and any other finite Abelian group is isomorphic to one 
of these groups. 
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of second and third order 22ff 
second-order 23 
skew-symmetric 42 
of a system 54 
theory of 12 ’ 
axiomatic construction of 103f 
third-order 25, 37 
Vandermonde 49, 329, 336 
Determinate system 16 
Diagonal, principal 16 
Diagonal matrices 371 
Diagonalization of a matrix 203 
Difference 261 
Differential algebra 11 
Differentiating a sum and a product, 
formulas for 142 
Dimension of a space 185 
Diophantos of Alexandria 12 
Direct decomposition 400 
Direct product 406 
Direct sum 400, 403 
Discriminant 326, 334 
of an equation 228 
of a quadratic equation 335 


Disjoint cycles 35 
Dividend (of a polynomial) 306 
Divisible 134 
exactly 134 
DIV bsy of polynomials 134-133, 
Division in a field, uniqueness of 267 
Division algorithm 129, 134 
Divisor(s) 1341f 
common 133 
elementary (of a matrix) 376 
elementary (of a polynomial) 376 
greatest common 131f, 133 
of integers 133 
of polynomials 133, 135, 138 
normal 394f, 398 
of a polynomial 306 
of unity 285 
Duncan, W.J. 414 


Eigenvalues 199f 
Higenvector 200 
Hilenberg, S. 444 
Eisenstein criterion 344-345 
Element(s) 

component of 400 

conjugate 395 

identity (of a group) 385 

of infinite order 389 

inverse 179, 384 

multiples of 389 

opposite 179 

power of 389 

prime (of a ring) 285 

of a set 261 

unit 269, 383, 385 

zero 180, 264 
Elementary algebra 7 
Elementary divisors of a matrix 376 
e ay divisors of a polynomial 
Elementary matrix 363 
ger symmetric polynomials 
Elementary transformations 74 

of a matrix 355 
Elimination of unknown 326, 334 
Equalizing coefficients, method of 23 
Equation(s) 

cubic 226 

incomplete 226 
with real coefficients 228 

general theory of 12 

higher-degree 231 

homogeneous linear (see systems of 

h.l. eqs.) 
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nonhomogeneous (see system of n. 


eqs.) 
nth-degree 232 
quadratic 225 
quartic 230 
quintic 232 l 
of second, third, fourth degree 225ff 
solvability of by radicals 12 
systems of linear (see systems of 1. 
eqs.) 
Equivalence of A-matrices 355ff 
Equivalence relation 356 
Euclidean algorithm (see Euclid’s a.) 
Euclidean space(s) 204 
isomorphic 208 
isomorphism of 208ff 
n-dimensional 204, 205 
Euclid’s algorithm 133, 136, 241 
Expansion of a determinant 47 
Extensions 274ff 


Factor(s) 
double 284 
invariant 364 
k-fold 284 
multiple 284, 287 
isolation of 288 
simple 284 
Single 284 
triple 284 
Factor groups 394, 395, 396 
examples of 397 
Factorization of polynomials 284 
into irreducible factors 284f 
Faddeyev, D.K. 414 
Faddeyeva, V.N. 414 
False position, method of 254 
Ferrari, L. 12, 230 
Ferro, S. del 42 
Fibonacci, L. (see Leonardo of Pisa) 12 
Field(s) 9, 267f 
of algebraic functions, theory of 
10, 13 
of eaer numbers, theory of 10. 
4 


characteristic. of 270 

commutative 302 

of complex numbers 
construction of 273, 275, 295 
uniqueness of 272 

concept of 257 

definition of 267 

division in, uniqueness of 267 

finite 268 l 
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general theory of 13 
number 257, 259, 274 
ghia ned by adjoining an element 
27 
of rational fractions 297ff 
of rational numbers 260, 344 
splitting 296 
Finite Abelian groups 406ff 
fundamental theorem on 406 


Finite characteristic 271 
Finite cyclic group 394 
Finite fields 268 
Finite group 383 
Finite rings 268 


Finite-dimensional linear space 183 
Finite-dimensional spaces 182 
Finite-dimensional unitary spaces 210 
Form(s) 
cubic 306 
of degree s 306 
Jordan normal 370f 
reduction of a matrix to 375 
linear 62, 306 
negative definite 177 
normal 4169, 170 
of a matrix 355ff 
pairs of 249, 223 
positive definite 174ff 
ia (see also quadratic form) 
0 


uartic 306 
theory of 8 
trigonometric (of complex number) 


Formula(s) 
Cardan’s 227, 229 
De Moivre’s 120 
for differentiating a sum and a pro- 
duct 142 
Lagrange interpolation 153 
Newton’s 323 
Taylor’s 145 
Vieta’s 154, 296, 343 
Fourier (see Budan-Fourier Theorem) 
Fraction(s) 
partial 157 
rational 156, 298 
field of 297ff 
in lowest terms 156 
proper 156 
simplified 156 | 
symmetric 321 
symmetric rational 321 
Fractional rational functions 156 
Frazer, R.A. 444 
Free unknowns 79 
Frobenius, F.G. 13 
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Function(s) 
continuous 144 
fractional rational 156 
rational integral 337 
symmetric 312 
Functional analysis 9, 10 
Fundamental system of solutions 84 
Fundamental theorem 
of the algebra of complex numbers 
4 43ff 
alternative proof of 337f 
corollaries to 15/4 ff 
on finite Abelian groups 406 
of higher algebra 143 
on the similarity of matrices 367 
on symmetric polynomials 314, 316, 
319 


Galois, E. 9, 12, 13, 232 
theory of 44, 13 

Gantmacher, F.R. 414 

Gauss, C.F. 12, 143 

Gauss’ (or Gaussian) elimination pro- 
cess 24 

Gauss’ (or Gaussian) lemma 307, 342 

a (or Gaussian) method 17, 18, 
0 


Gelfand, I.M. 414 
Generate (verb) (subgroup generated 
by subgroups) 401 
Generation of a linear subspace 196 
Generator 390 
Geometry 
algebraic 9, 326 
projective 14 
Graefie method 256 
Grassmann, H. 43 
Grave, D.A. 13, 415 
Greatest common divisor 134f, 133 
of integers 133 
of polynomials 134, 135, 138 
Group(s) 10, 382ff 
Abelian 383, 385-387 
finite 406ff 
indecomposable 405 
primary 407 
addition in 385 
additive 385 
alternating 387 
commutative 383 
continuous 11, 13 
theory of 13 
cyclic 390ff 
primary 407 
decomposition of 391 ff 
definition of 382, 383 


factor 394, 395, 396 
examples 397 

finite 383 

finite Abelian 406ff 
complete survey of 413 
fundamental theorem on 406 

finite cyclic 394 

general theory of 413 

infinite cyclic 390, 394 

isomorphic 385 

Lie 14 

multiplication in 382 

multiplicative 386, 391 

noncommutative 387 

order of 383 

primary 407 

primary Abelian 407 

primary cyclic 405, 407 

theory of 10, 382 
Soviet school of 414 

Gurevich, G.B. 444 


Hamilton, W.R. 413 

Hamilton (see Cayley-Hamilton theo- 
rem) 

Hecke, E. 415 

Height of a polynomial 353 

Higher algebra 7, 8 

Higher-degree equations 234 

Highest term of a polynomial 341 

“Hisab al-jabr w’al-muga-balah” 12 

Hodge, W. V.D. 415 

Holder, O. 413 

Homogeneous linear equations (see 
systems of h.l. eqs.) 

Homogeneous polynomial 306 

Homological algebra 13 

Homomorphic mapping 397 

Homomorphism(s) 394, 397 
canonical 398 
theorem on 398 

Horner method 140, 1414 

Hurwitz, A. 13 

Hypercomplex numbers, theory of 10 

Hypercomplex systems, theory of 13 


Ideals, theory of 10, 13 

Identity element of a group 383, 385 
Identity matrix 93 

Identity permutation 34 
Identity transformation 189, 195, 214 
Image 188 
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Imaginaries 

axis of 112 

pure 412 
Imaginary part 112 
Imaginary unit 142 
Incomplete cubic equation 226 
Inconsistent system of linear equa- 

tions 16 
Indecomposability of groups 405 
Indecomposable Abelian group 405 
Indefinite quadratic forms 177 
Indeterminate system 16 
Index 

of inertia 

negative 172 
positive 172 

of a subgroup 393 
Inertia 

law of 169f, 170 

negative index of 172 

positive index of 172 
Infinite cyclic group 390, 391 
Infinite-dimensional linear spaces 184 
Infinite-dimensional spaces 9 
Integers, system of 107 
Integral rational functions 156 
Interpolation, linear, method of 254 
Interpolation formula, Lagrange 153 
Invariant (adj.) 244 
Invariant factors 364 
Invariant subgroup 394 
Invariants, theory of 9 
Inverse (to a class) 304 
Inverse of a permutation 33 
Inverse element 179, 384 
Inverse linear transformation 199 
Inverse matrices 93 
Inverse matrix 

left 94 

right 94 
Inverse operation 264 
Inverse polynomial 129 
Inverse transformation 199 
Inversion 29 
Irrational numbers 107 
Irreducible (of a polynomial) 281, 306 
Irreducible (of a solution) 230 
Isomorphic (adj.) 272 
Isomorphic correspondence 182 
Isomorphic Euclidean spaces 208 
Isomorphic groups 385 
Isomorphic real linear spaces 184 
Isomorphism(s) 178, 184 

of Euclidean spaces 208ff 

of fields 272ff 

of rings 272ff 
Iterative procedures 58 


Jacobson, N. 414, 415 

Jordan, M.E.C. 13 

Jordan matrices 370 

Jordan matrix of order n 370 

Jordan normal form 370f 
reduction of a matrix to 375 

Jordan submatrix 374 


Kernel 

of a homomorphism 398 

of a linear transformation 197 
Khayyam, Omar, 12 
Kronecker, L. 13, 345 
Kronecker-Capelli theorem 77, 78, 84 
Kummer, E.E. 13 
Kurosh, A.G. 445 


Lagrange, J.L. 12, 13 
Lagrange interpolation formula 153 
Lagrange’s theorem 393 
Laplace, P.S. 12 
Laplace’s theorem 50, 54 
Lattice 11 
Lattice theory 411, 13 
Law of inertia 169f, 170 
Leading coefficient 126 
Left coset 392 
Left decomposition 392 
Left-identity 384 
Left-inverse 385 
Left inverse matrix 94 
Lemma (see theorem) 
d’Alembert’s 147, 149 
Gauss’ (or Gaussian) 307, 342 
on the increase of the modulus of 
a polynomial 146 
on the modulus of the highest-degree 
term 145 
Leonardo of Pisa (see Fibonacci) 12 
Lie, S. 13 
Lie groups, theory of 14 
Linear algebra 7, 8, 13, 15, 276 
Linear combination of vectors 62 
Linear dependence of vectors 62ff 
Linear equations (see systems of 1. 
eqs.) 
Linear form 62, 306 
Linear interpolation, method of 254 
Linear polynomials 127, 139 
Linear spaces 7, 178ff 
complex 202, 209 
finite-dimensional 183 
infinite-dimensional 181 
n-dimensional 185 
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Linear subspaces 195ff, 202 
generation of 196 
Linear substitution 87 
Linear transformation(s) 87, 89, 188f 
inverse 199 
kernel of 197 
nonsingular 93, 198 
nonsingularity of 224 
null space of 197 
operations on 193 
product of 193 
by a scalar 193 
rank of 197 
with a simple spectrum 202 
singular 93 
spectrum of 200 
Linearly dependent system of vectors 
Linearly independent system of vec- 
tors 63 
Lobachevsky, N.I. 13 
method of 256 
Lyapin, E.S. 4414, 445 


Maltsev, A.I. 444 
Mapping, homomorphic 397 
Matrices (see also matrix) 
diagonal 374 
fundamental theorem on the simi- 
larity of 367 
inverse 93ff 
Jordan 370 
A-matrices 355 
canonical 356, 357 
equivalence 355ff 
equivalent 356 
unimodular 362ff 
of a linear transformation in diffe- 
rent bases, relationship between 
194 
noncommutative 90 
numerical 355 
orthogonal 210ff, 214 
polynomial 355 
product of 124 
rectangular 70 
multiplication of 97 
scalar 102 
similar 192, 200 
similarity of, fundamental theorem 
on 367 
square, similar 192 
theory of 8 
Matrix (see also matrices) 16 
adjoint of 94 


augmented 21 
change-of-basis 186 
characteristic 199 
definition of 23 
diagonalization of 203 
elementary 363 
elementary divisors of 376 
elementary transformations of 355 
identity 93 
Jordan (of order n) 370 
left-inverse 94 
multiplication of by a scalar 99, 100 
normal form of 355ff 
of a quadratic form 162 
reduction of to diagonal form 75, 
203 
reduction of to Jordan normal form 
375 
right-inverse 94 
square 93 
nonsingular 93 
of order n 16 
singular 93 
transformations of, elementary 355 
unit 16, 93, 195, 214 
zero 100, 195 
Matrix addition 99 
Matrix multiplication 87ff, 89 
Matrix polynomials 365f 
Matrix root of a polynomial 378 
Maximal linearly independent system 
of vectors 65, 68 
Method 
alphabetical 310 
of equalizing coefficients 23 
of false position 254 
Graeffe 256 
Horner 140, 1414 
iterative (see iterative procedure) 
of linear interpolation 251 
of Lobachevsky 256 
Newton’s 236, 252, 253 
Sturm’s 238 
Minimal polynomials 377ff 
Minor(s) 43ff 
complementary 43 
kth-order (of a matrix) 70 
of order k 43 
principal (of a form) 175 
Modulus 113 
of a product of complex numbers 115 
of a quotient of two complex num- 
bers 1416 
of a sum 117 
Molin, F.E. 13 
Multidimensional space 7 
Multidimensional vector spaces 59 
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Multiplication 264 
of classes 268 
in a group 382 
matrix, associativity of 96 
of a matrix by a scalar 99, 100 
noncommutativity of 90 
of rectangular matrices 97 
scalar 204 
of vectors by a scalar 64 
Multiplication theorem for determi- 
nants 91, 93 
Multiplicative group 386, 391 
Multiple(s) 
of an element 389 
zero 265 
Multiple factors 284, 287 
isolation of 288 
Multiple roots 144 
Multiplicity of a root 144, 152 
Murnaghan, F.D. 415 


Negative definite forms 177 
Negative index of inertia 172 
Newton, Isaac 12 
Newton’s binomial theorem 120 
Newton’s formulas 323 
Newton’s method 236, 252, 253 
Noether, E. 9 
Noether, M. 13 
Nonassociative rings 267 
Noncommutative groups 387 
Noncommutative matrices 90 
Noncommutative ring 266 
Noncommutativity of multiplication 
90 
Noncommutable set 352 
Nonhomogeneous equations 83 
Nonhomogeneous system 83 
Nonsingular linear transformations 93, 
198 
Nonsingular quadratic form 162 
Nonsingular square matrix 93 
Nonsingular transformation 241 
Nonsingularity of a linear transfor- 
mation 224 
Norm of a number 286 
Normal divisors 394f, 398 
Normal form 169, 170 
of a matrix 355ff 
Normalization of a vector 208 
Normalized vector 207 
Notation, additive 400 
Null space of a linear transformation 
197 
Nullity of a transformation 197, 198 
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Number(s) 
algebraic 349f 
conjugate 350 
set of 350 
Cayley 141 
complex 107, 110, 142ff 
raising to a power 120 
taking roots of 120ff, 122, 123 
taking the square root of 122 
conjugate 118 
conjugate complex 118 
hypercomplex 10 
irrational 105 
rational 105 
field of 344 
real 105 
transcendental 349, 353, 354 
Number fields 257, 260, 274 
Number rings 257, 258, 259 
Numerical matrices 355 


Okunev, L.Ya. 414, 445 
Omar Khayyam 12 
Operation 
algebraic 264 
inverse 264 
Opposite class 294 
Opposite element 179 
Order of a group 382 
Orthogonal bases 207 
Orthogonal matrices 210ff, 214 
Orthogonal system (of vectors) 206 
Orthogonal transformation(s) 24 Off 
of Euclidean space 212 
Orthogonalization process 206, 207 
Orthonormal bases 204, 208 
Orthonormal basis 208 


Parity of permutations 34 
Part 
imaginary 112 
real 112 
Partial fraction 157 
Pedoe, D. 415 
Permutation(s) 27ff 
cyclic 34 
decrement of 36 
definition of 28 
of degree n (definition) 30, 32 
even 32 
identity 31 
inverse of 33 
multiplication of 32 
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odd 32 
parity of 34 
Pi (x), transcendence of 259 
Plane, complex 112 
Polar angle 143 
Polynomial(s) 156 
algebra of over an arbitrary field 276 
algebraic viewpoint of 4127 
alphabetic order of terms of 310, 314 
is annihilated by a linear transfor- 
mation 381 
characteristic (of a matrix) 200 
cubic 127 
cyclotomic 345 
decomposition of 284 
definition of 127 
degree of 303 
of degree n 127 
of degree one 139 
of degree zero 127, 129 
derivative of 141 
dividend of 306 
divisibility of 134-133, 305 
divisor of 306 
elementary divisors of a 376 
equal 127, 303 
evaluating roots of 225ff 
factorization of 284 
into irreducible factors 284f 
first-degree 127 
as a formal algebraic expression 127 
function-theoretic viewpoint of 127 
greatest common divisor of 134 
highest term of 344 
homogeneous 306 
identically equal 127, 303 
integral, rational roots of 345ff 
inverse 129 
irreducible 281, 306 
linear 127, 139 
matrix 365f 
minimal 377ff 
ntb-degree 127 
operations on 126ff 
primitive 342 
quadratic 127 
quotient of 134 
with rational coefficients 341 ff 
with real coefficients 155 
reducibility of over the field of 
rationals 344ff 
reducible 281, 306 
relatively prime 133 
theorems on 137 
remainder in division of 134 
ring of 279, 304 
roots of 139ff 


in several unknowns 303ff 
sum of 304 
symbols for 127 
symmetric 342ff, 349ff 
elementary 313 
fundamental theorem on 314, 346, 
9 
in two systems of unknowns 324 
value of 139, 377, 384 
from viewpoint of mathematical 
analysis 127 
Polynomial matrices 355 
Pontryagin, L.S. 415 
Position, false, method of 254 
Positive definite forms 174ff 
Positive definite quadratic forms 174ff 
Positive definiteness of a form 177 
Positive index of inertia 172 
Postmultiplication 94, 99 
Power 
of an element 389 
raising complex numbers to a 120 
zero 389 
Power sums 322 l 
Premultiplication 99 
Primary Abelian group 407 
Primary components 407 
Primary cyclic groups 405, 407 
Primary group (pubgroup) 407 
Prime element of a ring 285 
Primitive nth roots of unity 125 
Primitive polynomial 342 
Primitive root 394 
Principal-axis theorem 219, 220 
Principal diagonal 16 
Principal minors of a form 175 
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of classes 294, 301 

direct 406 

of matrices 89 

scalar (of vectors) 205 
Projective geometry 11 
Proper rational fraction 156 
Proskuryakov, I.V. 414 
Pure imaginaries 112 


Quadratic equations 225 

Quadratic form(s) 306 
canonical 164 
complex 162 
decomposable 172 
definition of 162 
indefinite 177 
matrix of 162 
negative definite 177 
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nonsingular 162 
positive definite 174ff 
rank of 162 
real 162 
reduction of to canonical form 161 ff 
theory of 168 
reduction of to principal axes 168, 
24 Off 
semidefinite 177 
theory of 164 
Quadratic polynomial 127 
Quadric curves and surfaces, theory 
of 164 
Quartic equations 230 
Quartic form 306 
Quasigroups, theory of 13 
Quaternions 114 
Quintic equations 232 
Quotient 267 
of a polynomial 134 


Radius vector 113 
Range of values (of a transformation) 
197 
Rank 
of a linear transformation 197 
of a matrix 69ff 
evaluating 72 
of a product of matrices 98 
of a quadratic form 162 
of a system of vectors 68 
Rank theorem 72, 74 
Rational fractions 156f, 298 
fieldvof 297ff 
in lowest terms 156 
proper 156 
simplified 156 
Rational numbers 107 
field of 344 
Rational roots of integral polynomials 
345% 


Real linear spaces 178 

Real numbers 107 

Real part 1412 

Reals, axis of 412 

Rectangular matrices 70 
multiplication of 97 

Reduced system 86 

Reducibility of polynomials over the 
field of rationals 341 ff 

Reducible (of a polynomial) 281, 306 

Reduction 
of a matrix to diagonal form 203 
of z maa to Jordan normal form 


of quadratic forms to canonical 
form 164ff 
of quadratic forms to principal axes 
168, 241 9ff 
Regula falsi 254 
Relation, equivalence 356 
Relatively prime polynomials 133 
theorems on 137 
Relatively prime system of polyno- 
mials 138 
Remainder of polynomials (in division) 
1 


Resultant 326, 327, 330 
Right decomposition 392 
Right-identity 384 
Right-inverse 384 
Right inverse matrix 94 
Ring(s) 10, 260ff | 
commutative 267 
concept of 257 
definition of 262 
examples of 262 
finite 268 
of functions 262 
nonassociative 267 
noncommutative 266 
number 257, 258, 259 
of polynomials 279, 304 
theory of 10, 43 
Root(s) 
approximation of 250ff 
bounds of 232ff 
characteristic 199ff, 216 
of complex numbers 120ff, 122, 123 
k-fold 144 
matrix 378 
multiple 144 
: of polynomials 139ff, 378 
primitive 391 


rational (of integral polynomials) 
345ff 
simple 141 


theorem on the existence of a 290f 
theorems on the number of real 244f 
of unity 124ff 
primitive nth 125 
Ruffini, P. 12 


Scalar matrices 102 

Scalar multiplication 204 

Scalar product of vectors 204 
Schmidt, O.Yu. 14, 445 
Schreier, O. 414 

Self-adjoint transformation 245 
Semidefinite quadratic forms 177 
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Semigroups, theory:of 13 
Sequence, Sturm’s 239 
Set 
countable 352 
denumerable 352 
noncountable 352 
Shapiro, G.M. 414 
Shatunovsky, S.O. 13 
Shilov, G.E. 414 
Signature of a form 172 
Similar matrices 192, 200 
Similar square matrices 192 
Similarity of matrices, fundamental 
theorem on 367 
Simple factor 284 
Simple root 141 
Simple spectrum 202, 203 
Simplified rational fraction 156 
Single factor 284 
Singular linear transformation 93 
Singular square matrix 93 
Skew-symmetric determinant 42 
pe ability of equations by radicals 
Sominsky, I.S. 414 
Space(s) 
complex linear 181, 202 
Euclidean (see also Euclidean spa- 
ce) 204 
finite-dimensional 182 
four-dimensional 7 
of functions 185 
infinite-dimensional 9 
linear 7, 178ff 
finite-dimensional 183 
infinite-dimensional 184 
n-dimensional 185 
multidimensional 7 
null 197 
real affine 178 
real linear 178, 1814 
isomorphic 184 
real vector 178 
of sequences 185 
unitary 209 
finite-dimensional 210 
vector (see also vector spaces) 7 
theory of 9 
Spectrum 
of a linear transformation 200 
simple 202, 203 
Sperner, E. 444 : 
Splitting field 416 
Square matrix 93 
Sturm method 238 
Sturm sequence 239 
Sturm theorem 238ff 


Subfields 274 ff 
Subgroup(s) 388ff 
cyclic 389 
generated by subgroups 404 
invariant 394 
primary 407 
unit 389 
Submatrix, Jordan 371 
Subspace(s) 
linear 195ff, 202 
generation of 196 
zero 195 
Substitution, linear 87 
Subtraction 261 
Successive elimination of unknowns, 
method of 15, 17 
Sum (s) 
of classes 293, 300 
direct 400, 403 
of polynomials 304 
power 322 
Summands of a decomposition 100 
Sushkevich, A.K. 414, 415 
Sylow, 13 
Sylvester, J.J. 13 
Symmetric functions 312 
Symmetric polynomial in two systems 
of unknowns 324 
Symmetric polynomials 342ff, 349ff 
elementary 313 . 
fundamental theorem of 344, 3146, 
319 
Symmetric rational fractions 321 
Symmetric transformations 245f 
System (s) 
of Cayley numbers 114 
of complex numbers 107ff 
definition of 140 l 
of homogeneous linear equations 82f 
solutions of 83 
of integers 107 
of linear equations 76 
arbitrary, solution of 79 
consistent 16 
determinate 16 
indeterminate 16 
general theory of 59ff 
inconsistent 16 
of nonhomogeneous equations 83 
orthogonal (of vectors) 206 
of quaternions 1411 ` 
of rational numbers 107 
of real numbers 107 
reduced 86 
of solutions, fundamental 84 
of vectors 
equivalent 67 
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linearly dependent 63, 64 

linearly independent 63 

maximal linearly independent 65, 
68 . 


Tartaglia, N. 12 
Taylor’s formula 145 
Tensor algebra 9 
Term 
degree of 303 
highest, of a polynomial 311 
Theorem(s) (see lemma) 
binomial 120 
Budan-Fourier 246, 249 
Cayley-Hamilton 380, 381 
Descartes’ 247, 249, 348 
on the existence of a root 290f 
on the existence of roots, fundamen- 
tal 12 
fundamental (of the algebra of com- 
plex numbers) 442ff 
alternative proof of 337f 
fundamental, corollaries to 154 
fundamental (on finite Abelian gro- 
ups) 406 
fundamental (of higher algebra) 143 
fundamental, on the similarity of 
matrices 367 
fundamental, on symmetric polyno- 
mials 314, 316, 319 
on homomorphisms 398 
Kronecker-Capelli 77, 78, 84 
Lagrange’s 393, 557 
Laplace’s 50, 54 
multiplication (for determinants) 
Newton's binomial 120 
on the number of real roots 244f 
principal-axis 219, 220 
rank 72, 74 
on relatively prime polynomials 137 
Sturm’s 238ff 
unique factorization 308 
Weierstrass 150, 214 
Theory of algebras 13 
Topological algebra 41, 43 
Topological properties of real and com- 
plex numbers 143 
Transcendence 
of e 349 
of n 259 
Transcendental numbers 349, 353, 354 
Transcendental over a field 279, 305 
Transform of an element 395 
Transformation(s) 
affine 244 


elementary 74, 102 
of a matrix 355 
identity 189, 195, 214 
inverse 199, 279 
linear (see linear transformations) 
87, 89, 188f 
nonsingular 93 
operations on 193 
singular 93 
nonsingular 241 
nullity of 197, 198 
orthogonal 210ff 
of Euclidean space 242 
range of values of 197 
self-adjoint 215 
symmetric 215f 
of vector coordinates 186 
zero 189, 195, 215 
Transpose 
of a determinant, taking 38 
of a matrix 162 
Transpose operation 38 
Transposition 34 
Trigonometric form (of complex num- 
bers) 144 


Unimodular A-matrices 362ff 
Unique decomposition of a proper ra- 
tional fraction 159 
example of 160 
Unique factorization theorem 308 
Unit, imaginary 1412 
Unit, class 294, 304 
Unit element 269 
of a group 383, 385 
Unit matrix 16, 93, 195, 244 
Unit subgroup 389 
Unit vectors 64, 185 
Unitary space(s) 210 
finite-dimensional 210 
Unity 269 
divisor of 285 
primitive nth roots of 125 
roots of 124 
Universal algebras 14 
Unknowns, free 79 


Value of a polynomial 377, 381 
Van der Waerden, B.L. 415 
Vandermonde determinant 49, 329,. 
336 
Vector(s) 178 
examples of 60, 84 
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multiplication of by a scalar 64 
n-dimensional 60 
n?-dimensional 60 
normalized 207 
opposite 64 
unit 64, 185 
zero 64 

Vector space(s) 9, 178 
multidimensional 59 
n-dimensional 59, 60, 62 
theory of 9 

Vectorial angle 113 

Vieta (Viète) F. 42 

Vieta’s formulas 154, 247, 296, 343 

Vinogradov, S.P. 4414 

Voronoi, G.F. 43 


Waerden, van der, B.L. 415 
Weierstrass theorem 150 | 

Weight of a term 320, 332 

Weyl, H. 445 


Zero 109 

the number 61 
Zero class 294, 300 
Zero element 180, 264 
Zero matrix 100, 195 
Zero multiple 265 
Zero power 389 
Zero subspace 195 
Zero transformation 189, 195, 215 
Zero vector 60 
Zolotarev, E.I. 13 
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lems. Covers questions in saddle-point method and Wiener-Hopf equations. 


ADVANCED MATHEMATICS 
FOR ENGINEERS. 


SPECIAL COURSES 
by A.MySkis, D.Sc. 


This manual presents the subject from the standpoint of modern applied 
mathematics, with maximum use of intuition and analogy; pays special atten- 
tion to both the qualitative and the quantitative description of facts. Desig- 
ned for engineering students, teachers, engineers, and research workers in 
the applied sciences, Provides a useful bibliography. Contents. Scalar and 
Vector Fields. The Theory of Analytic Functions, Operational Calculus. 
Linear Algebra. Tensors. Calculus of Variations. Integral Equations. Ordi- 
nary Differential Equations. 


HANDBOOK 
OF HIGHER MATHEMATICS 


by M. Vygodsky, D.Sc. 


Intended for students and engineers, teachers and sixth-form pupils as a 
practical reference book, or as a compact study aid giving elementary ac- 
quaintance with the subject. Contains material on the history of mathemae 
tical ideas and brief biographical notes on the mathematicians who de- 
veloped them, 
Contents. Analytical Plane Geometry. Analytical Solid Geometry. Basic 
Concepts of Mathematical Analysis. Differential Calculus. Integral Calculus. 
Lines in a Plane and Space. Differentiation and Integration of the Func- 
tions of Two or More Arguments. Differential Equations. Famous Curves. 
Tables of Logarithms. 


_ LECTURES 
IN HIGHER MATHEMATICS 


by A. Myskis, D.Sc. 


A textbook for students, revised for translation from the 3rd Russian edition. 
Contents. Introduction. Quantity and Function. Analytic Geometry on a Plane. © 
Limit. Continuity. Derivatives. Differentials. Investigating the Behaviour ` 
of Functions. Approximate Solution of Finite Functions. Interpolation. De- 
terminants ‘and Systems of Linear Algebraic Equations. Vectors. Complex | 
Numbers and Functions. Functions of Several Variables. Analytic Geom- 
etry in Space. Matrices and Their Applications. Partial Derivatives. In- 
definite Integrals. Definite Integrals. Differential Equations. Multiple Ine 
tegrals. Series. Elements of the Probability Theory. On Modern Computers. 


A PROBLEM BOOK 
IN MATHEMATICAL ANALYSIS 


G. Berman, D. Sc. 


A book of problems in mathematical analysis for engineering students. Con- 
tains a systematic selection of problems and exercises to go along with the 
sections of the course in mathematical analysis. Theoretical information on 
the necessary formulas is not included in this book. The reader will find it 
in the corresponding sections of the textbook Mathematical Analy sis (A Brief 
Course for Engineering Students) by A.F. Bermant and I.G. Aramanovich 
brought out in English by. Mir Publishers in 1979. 

Contents. Function. Limit. Continuity. Derivative and Differential. Diffe- 
rential Calculus. Investigating Functions and Their Graphs. Definite Inte- 
gral. Indefinite Integral. Integral Calculus. Methods of Computing Definite 
Integrals. Improper Integrals. Applications of Integrals. Series. Functions 
of Several Variables. Differential Calculus. Applications of Differential 
Calculus of Functions of Several Variables. Double and Triple Integrals 
and Multiple Integration. Curvilinear Integrals and Surface Integrals. Dif- 
ferential Equations. Trigonometric Series. Fundamentals of Field Theory. 
Answers. Appendix: Tables of Basic Elementary Functions. 


PROBLEMS IN HIGHER MATHEMATICS 
V. Minorsky 


A collection of 2570 problems in analytical geometry and mathematical anal- 
ysis for university engineering students. As each section opens with the 
formulas, definitions and other theory needed for solution of the problems 
that follow, and answers and indications on how to go about the solution 
are provided for many of the problems, the book can be used either under 
the supervision of an instructor or for independent study by correspondence 
and extramural students. Each section closes with questions for revision, 
this material constituting one third of the volume of the book. This is the 
second English edition. 

Contents. Plane Analytical Geometry. Vector Algebra. Solid Analytical Ge- 
ometry. Higher Algebra. Introduction to Analysis. Derivatives and Differen- 
tials. Application of Derivatives. Indefinite Integrals. Definite Integrals. 
Curvature of a Plane Curve and Compound Curvature. Partial Derivatives, 
Full Integrals, and Their Application. Differential Equations. Double and 
Triple Integrals and Integrals over a Curve. Series. 


ALGEBRA CAN BE FUN. 


Ya. Perelman 


This book comes from the pen of a talented popularizer of science, Yakov 
Perelman. The book aims at developing reader’s interest in algebra. The 
author employs various means for this purpose: problems with unusual sub- 
jects arousing curiosity, entertaining excursion in the field of history and 
mathematics, unexpected applications of algebra in day-to-day life, etc. 
The book covers material from school curriculum, touching upon almost all 
its branches. 

The book is intended for senior secondary-school students and for those 
interested in mathematics. 

Contents. The Fifth Rule of Mathematics. Language of Algebra. To Help 
Arithmetic. Diaphantic Equations. The Sixth Rule of Mathematics. Secondary 
Equations. Highest and Lowest Values. Progressions. The Seventh Mathe- 
matical Rule. 


Mir Publishers’ books in foreign languages 
are exported by V/O Mezhdunarodnaya Kniga 
and can be purchased or ordered through 
booksellers in your country dealing with V/O 
Mezhdunarodnaya Kniga, USSR 

(200, Moscow, USSR) 


